Local Genomic Relationship Matrices

Locally Epistatic Genetic Relationship Matrices forImproved Genomic Association, Prediction and Selection

Deniz Akdemir

Cornell UniversityPlant Breeding & Genetics

[email protected]

January 24, 2013

Deniz Akdemir (Plant Breeding & Genetics)Locally Epistatic Genetic Relationship Matrices January 24, 2013 1 / 42

Semi-Parametric Mixed Model

Selection in animal or plant breeding is usually based on estimates ofgenetic breeding values (GEBV) obtained with semi-parametric mixedmodels (SPMM).

A SPMM for the n × 1 response vector y is expressed as

y = Xβ + Zg + e (1)

where X is the n × p design matrix for the fixed effects, β is a p × 1vector of fixed effects coefficients, Z is the n× q design matrix for therandom effects; the random effects (g′, e′)′ are assumed to follow amultivariate normal distribution with mean 0 and covariance

(

σ2gK 0

0 σ2e In

)

where K is a q × q kernel matrix.


Kernel Functions

A kernel function, k(., .) maps a pair of input points x and x′ into realnumbers.

A kernel function is by definition symmetric (k(x, x′) = k(x′, x)) andnon-negative.

Given the inputs for the n individuals we can compute a kernel matrixK whose entries are Kij = k(xi , xj).

Linear kernel function: k(x; y) = x′y.

Polynomial kernel function: k(x; y) = (x′y + c)d for c and d ∈ R .

Gaussian kernel function: k(x; y) = exp(−(x′ − y)′(x′ − y)/h) whereh > 0.


Kernels, Heritability, Breeding value

Taylor expansions of these kernel functions reveal that each of thesekernels correspond to a different feature map.

For the marker based SPMM’s, a genetic kernel matrix calculatedusing a linear kernel matrix incorporates only additive effects ofmarkers.

A genetic kernel matrix based on the polynomial kernel of order kincorporates all of the one to k order monomials of markers in anadditive fashion.

The Gaussian kernel function allows us to implicitly incorporate theadditive and complex epistatic effects of the markers.



Simulation studies and results from empirical experiments show thatthe prediction accuracies of models with Gaussian or polynomialkernel are usually better than the models with linear kernel.

However, it is not possible to know how much of the increase inaccuracy can be transfered into better generations because some ofthe predicted epistatic effects that will be lost by recombination.



Commercial value of a line: overall genetic effect (additive+epistatic)

Breeding value of a line: potential for being a good parent (additive)

It can be argued that linear kernel model estimates the breeding valuewhere as the Gaussian kernel estimates the commercial value.

The breeder can take advantage of some of the epistatic markereffects in regions of low recombination.

The models introduced here aim to estimate local epistatic lineheritability by using the genetic map information and combine thelocal main and epistatic effects.



Heritability is defined as the percentage of total variation that can beexplained by the genotypic component.

One can argue that SPMM’s with linear kernels produce estimates ofnarrow sense line heritability, and the SPMM’s with Gaussian Kernelproduces estimates of broad sense line heritability.

We expect that the estimates of local heritability developed in thispaper to be between narrow and broad sense heritability.


Multiple Kernel Learning

In recent years, several methods have been proposed to combinemultiple kernel matrices instead of using a single one.

These kernel matrices may correspond to using different notions ofsimilarity or may be using information coming from multiple sources.

For example, genomic kernel + pedigree kernel, chromosome model,linear mixed models with linear covariance structure.

A good review and taxanomy of multiple kernel learning algorithms inthe machine learning literature can be found in Gonen and Alpaydin(2011)


Multiple Kernel Learning

Multiple kernel learning methods use multiple kernels by combiningthem into a single one via a combination function.

The most commonly used combination function is linear. Givenkernels K1,K2, . . . ,Kp, a linear kernel is of the form

K = η1K1 + η2K2 + . . .+ ηpKp.

The kernel weights η1, η2, . . . , ηp are usually assumed to be positiveand this corresponds to calculating a kernel in the combined featurespaces of the individual kernels. We will also assume that the weightssum to one.


Interaction Terms

The components of K are usually input variables from differentsources or different kernels calculated from same input variables.

The kernel K can also include interaction components like Ki ⊙ Kj ,Ki ⊗ Kj , or perhaps −(Ki − Kj)⊙ (Kj − Ki ).

For example, if KE is the environment kernel matrix and KG is thegenetic kernel matrix, then a component KE ⊙ KG can be used tocapture the gene by environment interaction effects.


Mixed Models with Heteregenous Covariance Structures

Models in Burgueno et al (2007)

(Separating main effects of genotype (x genotype) from genotypegenotype(x genotype) x environment interaction) Data from g genotypes,s sites, and r replicates

y = X + Zg (a+) + Zge(+) + Zr+

a ∼ N(0,Ga); ∼ N(0,Gi ); ∼ N(0,Gae); ∼ N(0,Gie); ∼ N(0,R);∼ N(0,N).R = Σr ⊗ I , N = Σn ⊗ I ; Σr and Σe diagonal. Ga = Σg ⊗ A.Gi = Σi ⊗ A⊙ A. Gae = Σge ⊗ A. Gie = Σie ⊗ A⊙ A.


Mixed Models with Heteregenous Covariance Structures

Models in Burgueno et al (2007)

Why A⊙ A?”If one assumes no dominance, all terms will vanish except the terms foradditive and additive x additive variances, which will take the formCi ′i = 2fi ′iσ

2a + (2fi ′i )

2σ2aa where fi ′i is the COP between individuals i ′ and

i , σ2a is the additive genetic variance, and σaa is the additive x additive

genetic variance. Assuming linkage and identity equilibrium, it seemsjustified to use (2fi ′i )

2, which in matrix notation can be represented by(A⊙ A) = A as the coefficient of the additive x additive component(where ⊙ is the element-wise multiplication operator) (Falconer andMackay, 1996).”


Learning Kernel Weights

Qiu and Lane (2009) propose simple heuristics to select kernelweights in regression problems:

ηm =r2m

∑ph=1 r

2h

and

ηm =

∑ph=1Mh −Mm

(1− p)∑p

h=1Mh

where rm is the Pearson correlation coefficient between true responseand the predicted response and Mm is the mean square errorgenerated by the regression using the kernel matrix Km alone.

Another approach in Qiu and Lane (2009) uses the kernel alignment.


Multiple Kernels Using Mixed Models 1

In the context of the SPMM’s we propose using weights that areproportional to the estimated variances attributed to the kernels.

One possible approach is to use a SPMM with multiple kernels in theform of

y = Xβ + Z1g1 + Z2g2 + . . .+ Zkgk + e (2)

where gj ∼ Nqk (0, σ2gjKj) for j = 1, 2, . . . , k . Let σ2

gjfor

j = 1, 2, . . . , k and σ2e be the estimated variance components.

Under this model calculate local heritabilities ash2m = σ2

gm/(∑k

ℓ=1 σ2gℓ+ σ2

e ) for m = 1, 2, . . . , k .



Another model incorporates the marginal variance contribution foreach kernel matrix.

For this we use the following SPMM:

y = Xβ + Zjgj + Z−jg−j + e (3)

where gj ∼ Nqk (0, σ2gjKj) for j = 1, 2, . . . , k . Z−j and g

−j correspondto the design matrix and the random effects for the input componentsother than the ones in group j .

In this case calculate local heritabilities ash2m = σ2

gm/(σ2

gm+ σ2

g−m+ σ2

em) for m = 1, 2, . . . , k .



A simpler approach is to use a separate SPMM for each kernel.

Let σ2gm

and σ2em

be the estimated variance components from theSPMM model in (1) with kernel K = Km.

Let h2m = σ2gm/(σ2

gm+ σ2

em).

Note that, in this case, the markers corresponding to the randomeffect g

−j which mainly accounts for the sample structure can now beincorporated by a fixed effects via their first principal components.


Estimation of Parameters

After local heritabilities are obtained, calculate the kernel weightsη1, η2, . . . , ηp as

ηm =h2m

∑ph=1 h

2h

. (4)

The estimates of parameters for models in (1), (2) and (3) can be bymaximizing the likelihood or the restricted (or residual, or reduced)maximum likelihood (REML).

There are very fast algorithms devised for estimating the parametersof the single kernel model in (1).


Local kernels

Regions or groups in the genome and a separate kernel matrix foreach group and region.

The regions can be overlapping or discrete.

If the some markers are associated with each other in terms of linkageor function it might be useful to combine them together.

The whole genome can be divided physically into chromosomes,chromosome arms. Further divisions could be based on recombinationhot-spots, or just merely based on local proximity.

We could calculate a seperate kernel for introns and exons, noncoding, promoter or repressor sequences.

We can also use a grouping of markers based on their effects on lowlevel traits like lipids, metabolites, gene expressions, or based on theirallele frequencies.

When no such guide is present one can use a hierarchical clustering ofthe variables.


Kernels: Kernel Scanning

This approach is similar to the one in Kizilkaya where thechromosomes are scanned with windows of 5 consecutive markers.

M: q × p matrix of p markers on q lines, which is partitioned withrespect to the chromosomes as (M1,M2, . . . ,Mc) where Mk has pkcolumns.

Cumulative distances based on LD between markers in eachchromosome: pk for k = 1, 2, . . . , c .

Based on pk calculate block diagonal p × p kernel matrix S .

k column of S is sk .

A local kernel matrix Kk at position k involves using diag(sk)1/2M in

kernel matrix calculations.

Kernel scanning approach involves calculation of a kernel matrix forselected marker across the genome at each marker location.

By adjusting the kernel width parameter, we are able to determine thesmoothness and locality of these kernel matrices.


I shrunk the relationship matrices too!

When the number of markers in a region is less than the number ofindividuals in the training set the kernel matrix for this regionbecomes singular or ill conditioned.

In these cases we can use shrinkage approaches to obtain wellconditioned positive definite kernel.

Shrinkage towards the identity matrix is not suitable to use with theSPMM since this involves allocating a fixed proportion of the errorvariance to the variance of the random effects.

We instead propose and use shrinkage estimators which aim tointroduce sparsity in the off diagonal elements of the kernel matrix.Many algorithms have been devised for learning sparse covariancematrices in the recent years.

Some sparse covariance estimation techniques are implemented in anR package ”huge”.


Sparse relationship matrix

In addition to possible decrease of computational burden by use ofsparse matrix methods in mixed model parameter estimation, we canproduce graphical representations of the kernel matrices.

For a normally distributed random vector, the independence betweentwo components is implied by zero covariance between thecomponents, more interestingly, the conditional independencebetween two components is implied by the zero components in theinverse covariance matrix.


Sparse relationship matrix on graph

Figure: Meinshausen & Buhlmann graph estimationDeniz Akdemir (Plant Breeding & Genetics)Locally Epistatic Genetic Relationship Matrices January 24, 2013 22 / 42

Sparse relationship matrix on graph-GBS-Stem rust data

Figure: GBS-Stem RustDeniz Akdemir (Plant Breeding & Genetics)Locally Epistatic Genetic Relationship Matrices January 24, 2013 23 / 42

Sparse relationship matrix on graph cap123 FHB DONData


Hypothesis Testing 1

Under Model (1), twice the log-likelihood of y given the parameters β, σ2e

and λ = σ2g/σ

2e is, up to a constant,

L(β, σ2e , λ) = −n log σ2

e − log |Vλ| −(y − Xβ)′V−1

λ (y − Xβ)

σ2e

(5)

where Vλ = In + λZKZ ′ and n is the size of the vector y.Twice the residual log-likelihood (Patterson, Harville) is, up to a constant,

RL(β, σ2e , λ) = −(n−p) log σ2

e−log |Vλ|−log(X ′VλX )−(y − X βλ)

′V−1λ

(y − X βλ)

σ2e

(6)



From both of these likelihoods, a test statistic for the significance of thevariance component

H0 : λ = 0 (σ2g = 0)

HA : λ > 0 (σ2g > 0)

can be obtained by calculating the likelihood ratio statistic (Cox andHinkley 1974).



Under standard regularity conditions the null distribution of thelikelihood ratio test statistic has a χ2 distribution with degrees offreedom given by the difference in the number of parameters betweenthe null and alternative hypothesis.

However, Self and Liang (1987) showed that the asymptoticdistribution is a weighted mixture of χ2 distributions. For the SPMMin (1) they have recommended using a equally weighted mixture ofχ2(0) and χ2(1) distribution where χ2(0) distribution refers to adistribution degenerate at 0.

In simulation studies Pinheiro and Bates (2000) Morrell (1998) foundthat equal contributions work well with residual log-likelihood whereas a 0.65 to 0.35 mixture works better for the log-likelihood.

A finite sample null distribution was recommended for the SPMM in(1) in Ruppert (2003) and it was shown that the mixture proportionsdepended on the kernel matrix.


Testing Scheme: Hierarchical testing scheme

Relevant regions are divided further subregions and the procedure isrepeated to a desired detail level.

Nevertheless, suitable hierarchical testing procedures have beendeveloped. Blanchard and Geman (2005) proposed and analyzedseveral hierarchical designs in terms of their cost / power properties.Multiple testing procedures where coarse to fine hypotheses are testedsequentially have been proposed to control the family wise error rateor false discovery rate (Benjamini and Yekutieli (2003), Meinshausen(2007)).

These procedures can be used along the ”keep rejecting until firstacceptance” scheme to test hypotheses in an hierarchy.


Meinshausen’s hierarchical testing procedure

Meinshausen’s hierarchical testing procedure controls the family wiseerror by adjusting the significance levels of single tests in thehierarchy.

The procedure starts testing the root node H0 at level α.

When a parent hypothesis is rejected one continues with testing allthe child nodes of that parent.

The significance level to be used at each node H is adjusted by afactor proportional to the number of variables in that node:

αH = α|H|

|H0|

where |.| denotes the cardinality of a set.


Meinshausen’s hierarchical testing procedure


Sim

ulated

Exam

ple

050

100150

200

0.0 0.2 0.4 0.6 0.8 1.0

Index

outhieri$hypothesistests[(Nparts + 1 + 1):length(outhieri$hypothesistests)]

12

3

0.50 0.55 0.60 0.65 0.70 0.75 0.80

Den

izAkdem

ir(P

lantBreed

ing&

Gen

etics)Locally

Epista

ticGen

eticRela

tionship

Matrices

January

24,2013

31/42

4 9 16 25 36 49 64 81 100 121 144 169 196 225 linear

0.4

0.5

0.6

0.7

0.8


1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

0.00

0.02

0.04

0.06

0.08

0.10

Figure: Local kernel weights for non overlapping sectionsDeniz Akdemir (Plant Breeding & Genetics)Locally Epistatic Genetic Relationship Matrices January 24, 2013 33 / 42

0 500 1000 1500 2000 2500 3000

0.00

0.05

0.10

0.15

0.20

1:length(Hkv)

Hkv


Balancing short and long term gains

While we may have confidence that GS can accelerate short-termgain, no such confidence is justified for long-term gain.

Weighted GS model of Jean Luc was used so that markers for whichthe favorable allele had a low frequency should be weighted moreheavily to avoid losing such alleles.

We recommend using the similar approach which aims to conserverare but favorable alleles for balancing short-term and long-term gainsfrom GS.



Let gji for regions j = 1, 2, . . . , k and individuals i = 1, 2, . . . ,N bethe EBLUPs of random effect components that correspond to the k

local kernels and let pji denote the estimated density value of thealleles in region j for individual i .

A selection criterion in the spirit of Goddart and Jean Luc for the ithindividual is then

ci =k

∑

j=1

gji

1 +√

pji. (7)

However, this criterion also down weights favorable but commonalleles.



A weighting scheme that up weights both rare and common favorablealleles.Let gj(n) be the maximum EBLUP value among the individuals for region j

and let ηj be the weight of the kernel j . The selection criterion

ci =k

∑

j=1

ηj2πh1sd(gj)h2sd(pj)

exp {−1

2

[

(gji − gj(n))2

h21var(gj)+

p2ji

h22var(pj)

]

} (8)

where the constants h1 and h2 are selected by the breeder based onpreferences given to short term and long term gains correspondingly.



3 380 709 33 31 39 8 92 70 73 49 4 790 94 46 93 419 5 347 69 19 84 1 423 54

Kernel scanning

−20

10

3 550 214 889 541 129 292 143 964 95 65 270 66 118 614 86 916 60 67 1 311 226

Kernel scanning with density weighting

−10

010

304 769 226 954 792 727 777 663 146 285 127 200 535 828 54 945 800

Kernel scanning (best 50)

010

25

186 411 226 117 285 957 849 663 194 828 610 673 773 671 764 341 800

Kernel scanning with density weighting (best 50)

04

8


Final remarks

Knowledge transfer between experiments


Robust to Missing Data

The local kernels use information collected over a region in the genomeand thus will not be effected by a few missing or outlier data points so thisapproach is also robust to missing data and outliers.


Chromosomal Breeding/ gene transplant

Breed good chromosomes, combine them.


Chromosomal Breeding/ gene transplant


Local Genomic Relationship Matrices

Documents

Transcript of Local Genomic Relationship Matrices