Statistical Analysis of Genetic and Phenotypic Data for ...genomics.cimmyt.org/SAGPDB/Slides...

34
Statistical Analysis of Genetic and Phenotypic Data for Breeders: Hands on Practical Sessions (GBLUP-RR) Paulino Pérez 1 José Crossa 2 1 ColPos-México 2 CIMMyT-México June, 2015. CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practica 1/34

Transcript of Statistical Analysis of Genetic and Phenotypic Data for ...genomics.cimmyt.org/SAGPDB/Slides...

Page 1: Statistical Analysis of Genetic and Phenotypic Data for ...genomics.cimmyt.org/SAGPDB/Slides Paulino/3. GBLUP-RR/GBLUP-R… · If we have individuals for training and testing, we

Statistical Analysis of Genetic and Phenotypic Datafor Breeders:

Hands on Practical Sessions (GBLUP-RR)

Paulino Pérez 1

José Crossa 2

1ColPos-México 2CIMMyT-México

June, 2015.

CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)1/34

Page 2: Statistical Analysis of Genetic and Phenotypic Data for ...genomics.cimmyt.org/SAGPDB/Slides Paulino/3. GBLUP-RR/GBLUP-R… · If we have individuals for training and testing, we

Contents

1 General comments

2 GBLUP-Ridge Regression

3 Application examples

4 Biplot from marker effects

5 Extension of BRR to include infinitesimal effect

CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)2/34

Page 3: Statistical Analysis of Genetic and Phenotypic Data for ...genomics.cimmyt.org/SAGPDB/Slides Paulino/3. GBLUP-RR/GBLUP-R… · If we have individuals for training and testing, we

General comments

General comments

Remember,

1 A simple model used frequently in plant breeding stands that thephenotypic value of an individual (P) is expressed as the summation ofthe genetic value (G) and the residual environmental effect (E):

P = G + E , (1)

where G includes additive, dominance and epistatic effects.2 A model that includes solely additive effects (A) can be easily derived

from (1), and can be expressed as follows,

P = A + E ′ (2)

where E ′ includes effects that are non additive.

CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)3/34

Page 4: Statistical Analysis of Genetic and Phenotypic Data for ...genomics.cimmyt.org/SAGPDB/Slides Paulino/3. GBLUP-RR/GBLUP-R… · If we have individuals for training and testing, we

General comments

Continue...

The breeding value (BV ) for an individual can be computed based on narrowsense heritability (h2),

BVi = µ+ h2(yi − µ),

where µ is mean phenotypic value of a population and yi is the phenotypicvalue for individual i .Obviously it is necessary to have information of parents and offsprings tocompute this.

CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)4/34

Page 5: Statistical Analysis of Genetic and Phenotypic Data for ...genomics.cimmyt.org/SAGPDB/Slides Paulino/3. GBLUP-RR/GBLUP-R… · If we have individuals for training and testing, we

General comments

Continue...In Genomic Selection (GS), genetic values are approximated using linearregression (Meuwissen et al., 2001), that is:

yi = gi + ei = µ+

p∑j=1

xijβj + ei (3)

Relationships between markergenotypes (x1i : 0 and 1) andphenotypes (yi ) of the individuals(open circles) in a training population.If the marker genotype is correlatedwith the phenotype, segregation ismodelled using the bold line (takenfrom Nakaya and Isobe, 2012).

CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)5/34

Page 6: Statistical Analysis of Genetic and Phenotypic Data for ...genomics.cimmyt.org/SAGPDB/Slides Paulino/3. GBLUP-RR/GBLUP-R… · If we have individuals for training and testing, we

General comments

Continue...

In GS it is possible to obtain Genomic Estimated Breeding Values (GEBVsfor short). This can be done simply by adding up marker effects (according toits marker genotypes) obtained from a training population, that is:

GEBVi =

p∑j=1

xij βj (4)

Next we show how to obtain the predictions yi (and in some cases βj ) usingseveral models.

CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)6/34

Page 7: Statistical Analysis of Genetic and Phenotypic Data for ...genomics.cimmyt.org/SAGPDB/Slides Paulino/3. GBLUP-RR/GBLUP-R… · If we have individuals for training and testing, we

General comments

Continue...

Figure 1: Graphical representation of parametric and non-parametric methods usedcommonly in whole-genomic prediction.

In this presentation we will focus in Ridge Regression.

CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)7/34

Page 8: Statistical Analysis of Genetic and Phenotypic Data for ...genomics.cimmyt.org/SAGPDB/Slides Paulino/3. GBLUP-RR/GBLUP-R… · If we have individuals for training and testing, we

General comments

Continue...

−6 −4 −2 0 2 4 6

0.0

0.2

0.4

0.6

0.8

βj

p(β j

)

GaussianDouble Exponential

Scaled−t (5df)BayesC (π=0.25)

Figure 2: Prior densities of regression coefficients with Markers.

CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)8/34

Page 9: Statistical Analysis of Genetic and Phenotypic Data for ...genomics.cimmyt.org/SAGPDB/Slides Paulino/3. GBLUP-RR/GBLUP-R… · If we have individuals for training and testing, we

GBLUP-Ridge Regression

GBLUP-RR

This is the most basic model used in GS. Let

yi = gi + ei = µ+

p∑j=1

xijβj + ei

marker effects are obtained by solving the following optimization problem,

minβ, λ{(y −

∑Xjβj)

′(y −∑

Xjβj) + λ∑

β2j

}, (5)

where λ > 0 is a regularization parameter.Notes:

1 λ is unknown and can be selected by using cross-validation2 we need to minimize a “penalized sum of squares”.

CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)9/34

Page 10: Statistical Analysis of Genetic and Phenotypic Data for ...genomics.cimmyt.org/SAGPDB/Slides Paulino/3. GBLUP-RR/GBLUP-R… · If we have individuals for training and testing, we

GBLUP-Ridge Regression

Continue...

The optimization problem has a closed solution,

β = (X ′X + λI)−1X ′y ,

where y = y − µ1. Unfortunately, we need to know the value of λ to use thissolution.

The problem can be solved easily using the Bayesian framework.

Let β ∼ N(0, σ2β I) and e ∼ N(0, σ2

e I), and u = Xβ, then model (3) can bewritten as:

y = µ1 + u + e (6)

Model (6) is know as GBLUP. Note that u ∼ N(0, σ2βXX ′)

CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)10/34

Page 11: Statistical Analysis of Genetic and Phenotypic Data for ...genomics.cimmyt.org/SAGPDB/Slides Paulino/3. GBLUP-RR/GBLUP-R… · If we have individuals for training and testing, we

GBLUP-Ridge Regression

Training and testing sets

Note also that the covariance matrix for u involves the product XX ′, which isproportional to the Genomic Relationship Matrix proposed by VanRaden(2008).

We will assume that u ∼ N(0, σ2uG) with G = XX ′/k . The mix-model

equations for (6) are as follows:(1′1σ−2

e 1′σ−2e

1′σ−2e Iσ−2

e σ−2u + Gσ−2

u

)(µu

)=

(1′yy

)(7)

u and µ are obtained solving the mix-model equations, assuming that thevariance components σ2

e and σ2u are known.

CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)11/34

Page 12: Statistical Analysis of Genetic and Phenotypic Data for ...genomics.cimmyt.org/SAGPDB/Slides Paulino/3. GBLUP-RR/GBLUP-R… · If we have individuals for training and testing, we

GBLUP-Ridge Regression

Continue...

If we have individuals for training and testing, we can partition G and u asfollows,

G =

(G11 G12G21 G22

),u =

(u1u2

),y =

(y1y2

),1 =

(1112

)1=individuals in the training set, 2=individuals in the testing set. µ and u1 areobtained as the solution of the mix-model equations,(

1′111σ−2e 1′1σ

−2e

1′1σ−2e I11σ

−2e σ−2

u + G11σ−2u

)(µu1

)=

(1′1y1y1

)The predictions for individuals in the testing set are given by

y2 = µ12 + G21G−111 u1

CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)12/34

Page 13: Statistical Analysis of Genetic and Phenotypic Data for ...genomics.cimmyt.org/SAGPDB/Slides Paulino/3. GBLUP-RR/GBLUP-R… · If we have individuals for training and testing, we

Application examples

Wheat dataset

Data for n = 599 wheat lines evaluated in 4 environments, wheatimprovement program, CIMMyT. The dataset includes p = 1279 molecularmarkers (xij , i = 1, ...,n, j = 1, ...,p) (coded as 0,1). The pedigree informationis also available.

Lets load the dataset in R,1 Load R2 Install BGLR package (if not yet installed)3 Load the package4 Load the data

CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)13/34

Page 14: Statistical Analysis of Genetic and Phenotypic Data for ...genomics.cimmyt.org/SAGPDB/Slides Paulino/3. GBLUP-RR/GBLUP-R… · If we have individuals for training and testing, we

Application examples

Continue...

Figure 3: Loading the BGLR package and the wheat dataset.

CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)14/34

Page 15: Statistical Analysis of Genetic and Phenotypic Data for ...genomics.cimmyt.org/SAGPDB/Slides Paulino/3. GBLUP-RR/GBLUP-R… · If we have individuals for training and testing, we

Application examples

Continue...You can explore the MM matrix, pedigree matrix within R,

fix(wheat.X)fix(wheat.A)

CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)15/34

Page 16: Statistical Analysis of Genetic and Phenotypic Data for ...genomics.cimmyt.org/SAGPDB/Slides Paulino/3. GBLUP-RR/GBLUP-R… · If we have individuals for training and testing, we

Application examples

Continue...

Lets assume that we want to predict the grain yield for environment 1 usingridge regression or equivalently the GBLUP. We do not know the value for σ2

eand λ, so we can obtain estimates using the data.

We will use the function BGLR. R code below fit the RR model using Bayesianapproach with non informative priors for σ2

e , σ2β ,

rm(list=ls())library(BGLR)data(wheat)

X=wheat.XY=wheat.Y

#Linear predictorETA=list(list(X=X,model="BRR"))

#Or#ETA=list(Markers=list(X=X,model="BRR"))

fmR<-BGLR(y=Y[,1],ETA=ETA,nIter=10000,burnIn=5000,thin=10)plot(fmR$yHat,Y[,1])

CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)16/34

Page 17: Statistical Analysis of Genetic and Phenotypic Data for ...genomics.cimmyt.org/SAGPDB/Slides Paulino/3. GBLUP-RR/GBLUP-R… · If we have individuals for training and testing, we

Application examples

Continue...

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

● ●

●●

● ●

●●

●●●

●●

● ●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●●

● ●

●●

●●

● ●●

● ●

●●

●●

●●

●●

●●

−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0

−2

−1

01

23

y

y

Figure shows observed vs predictedgrain yield.

Predictions y = µ+X β, and estimatesfor σ2

e , σ2β can be obtained easily in R

> fmR$yHat> fmR$varE[1] 0.5481523> fmR$ETA[[1]]$varB[1,] 0.002721897>

From the output above,λ = σ2

e/σ2β = 0.5482/0.0027 = 201.38

CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)17/34

Page 18: Statistical Analysis of Genetic and Phenotypic Data for ...genomics.cimmyt.org/SAGPDB/Slides Paulino/3. GBLUP-RR/GBLUP-R… · If we have individuals for training and testing, we

Application examples

A complete list of the objects attached to fmR can be obtained by typing

> names(fmR)[1] "y" "whichNa" "saveAt" "nIter" "burnIn" "thin" "weights"[8] "verbose" "response_type" "df0" "S0" "yHat" "SD.yHat" "mu"[15] "SD.mu" "varE" "SD.varE" "fit" "ETA"

A complete description of the output can be found in the BGLR referencemanual.

The GEBVs can be obtained easily in R,

#GEVBs

#option 1X%*%fmR$ETA[[1]]$b

#option 2fmR$yHat-fmR$mu

#Or fmR$ETA$Markers$b

CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)18/34

Page 19: Statistical Analysis of Genetic and Phenotypic Data for ...genomics.cimmyt.org/SAGPDB/Slides Paulino/3. GBLUP-RR/GBLUP-R… · If we have individuals for training and testing, we

Application examples

Training and testing set

Lets assume that we want to predict the grain yield for some wheat lines.Assume that we have only the genotypic information for those lines,

#Training and testing setsets<-wheat.setsy<-Y[,1]yNa=ywhichNa=(sets==2)yNa[whichNa]=NAfmR<-BGLR(y=yNa,ETA=ETA,nIter=10000,

burnIn=5000,thin=10)

plot(fmR$yHat,y,xlab="Phenotype",ylab="Pred. Gen. Value" ,cex=.8,bty="L")

points(x=y[whichNa],y=fmR$yHat[whichNa],col=2,cex=.8,pch=19)legend("topleft", legend=c("training","testing"),bty="n",

pch=c(1,19),col=c("black","red"))

CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)19/34

Page 20: Statistical Analysis of Genetic and Phenotypic Data for ...genomics.cimmyt.org/SAGPDB/Slides Paulino/3. GBLUP-RR/GBLUP-R… · If we have individuals for training and testing, we

Application examples

Continue...

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

● ●

●●

●●

●●

● ●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

●●

●●●

● ●

●●

●●

● ●●

● ●

●●

●●

●●

●●

●●

−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

−2

−1

01

23

Phenotype

Pre

d. G

en. V

alue

● ●

●●

●●

●●

●●

●●

trainingtesting

> MSE.tst<-mean((fmR$yHat[whichNa]-y[whichNa])^2)> MSE.tst[1] 0.8110028> MSE.trn<-mean((fmR$yHat[-whichNa]-y[-whichNa])^2)> MSE.trn[1] 0.4364856> COR.tst<-cor(fmR$yHat[whichNa],y[whichNa])> COR.tst[1] 0.4338218> COR.trn<-cor(fmR$yHat[-whichNa],y[-whichNa])> COR.trn[1] 0.7839615

CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)20/34

Page 21: Statistical Analysis of Genetic and Phenotypic Data for ...genomics.cimmyt.org/SAGPDB/Slides Paulino/3. GBLUP-RR/GBLUP-R… · If we have individuals for training and testing, we

Application examples

Questions?

CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)21/34

Page 22: Statistical Analysis of Genetic and Phenotypic Data for ...genomics.cimmyt.org/SAGPDB/Slides Paulino/3. GBLUP-RR/GBLUP-R… · If we have individuals for training and testing, we

Application examples

Exercise

Suppose that we want to predict the grain Yields for individuals in set 2 andenvironment 4.

Write an R program to solve the problem described aboveObtain the correlations in the training setObtain the correlations in the testing setWrite the predictions to a csv (comma separated values) so that you canread the file in Excel.

CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)22/34

Page 23: Statistical Analysis of Genetic and Phenotypic Data for ...genomics.cimmyt.org/SAGPDB/Slides Paulino/3. GBLUP-RR/GBLUP-R… · If we have individuals for training and testing, we

Application examples

Continue...

Suppose that we are interested in studying the predictive power of GBLUP.We can perform a simulation study to that end. In this exercise you willperform 10 fold cross validation.

Write an R program to perform a 10 fold cross-validation, use the useobject sets to allocate observations to folds.Report your results.

CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)23/34

Page 24: Statistical Analysis of Genetic and Phenotypic Data for ...genomics.cimmyt.org/SAGPDB/Slides Paulino/3. GBLUP-RR/GBLUP-R… · If we have individuals for training and testing, we

Biplot from marker effects

Biplots (The theory...)

A biplot is a two-dimensional representation of a data matrix C showinga point for each of the n observation vectors (rows of C) along with apoint for each of the p variables (columns of C), see Gabriel, 1971.Perform the SVD of C, that is C = UDV ′

Let Up×q = [α1, ...,αq] and V q×q = [γ1, ...,γq].plot α1 vs α2

Draw arrows, the coordinates of the end of the arrow are given in γ1 andγ2.

CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)24/34

Page 25: Statistical Analysis of Genetic and Phenotypic Data for ...genomics.cimmyt.org/SAGPDB/Slides Paulino/3. GBLUP-RR/GBLUP-R… · If we have individuals for training and testing, we

Biplot from marker effects

Continue...

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●●●

● ●●●

●●

●●

●●

●●

● ●●

● ●●

●●

●● ●

●●●

● ●●

●●

●●

●●

●●

●●

●●●

●●●

● ●

●●

●●

●●

●●●●

● ●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●● ●

●●

●●

● ●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

● ●

●●

● ●

● ●

●●

●●

●●

●●●●

● ● ●

●●●

●●●

●●

●●

●●

●●

● ●

●●●

●● ●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●●

●●●●

●●

● ● ●●

●●● ● ●●

●●

●●

●●●

●●

●●

●●●

●●

● ●●

●●

●●

●● ●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●● ●

● ●

● ●

●●●

●●

● ●

●●

●●

●●

● ●

●● ●

●● ●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

● ●

●●

●● ●

● ●

●●

● ●●

●●

●●

●●

●●

●●

●●

●● ●

●●●

●●

●●

● ●●

● ●

● ●

●●●

●● ●

● ●

● ●

●●

●●

●●

●●

● ●●

●●

●●

●●●

●●●

●●

●●

●●●

● ●

●●●

●●

●●

●●

●●

●●●

●●●

●●

●● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●●●

● ●●

●●●

●●

●●●●

●●

● ●●

●●

●●

●●

●●

● ●

●●

●● ●

●●● ●●

●●

● ●●●

●●

● ●

●●

●●●

●●

●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●●

●●●

●● ●

●●

● ●

●●● ●

●●

●●

●●

● ●●

●●

●●●

● ●●

●●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●● ●

●●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

● ●

● ●

●●

−0.1 0.0 0.1 0.2

−0.

10.

00.

10.

2

Comp.1 (50.17%)

Com

p.2

(24.

27%

)

wPt.4600

wPt.3533

wPt.2644

wPt.3462

wPt.3697

wPt.4988

wPt.6047

wPt.3922

wPt.9256

wPt.3393

wPt.3904

wPt.4706c.345897

c.373879

c.380591c.381717

c.408424

●●

E1

E2E3

E4

−20 −10 0 10 20 30 40

−20

−10

010

2030

40

Figure 4: An example of a Biplot derived from markers.CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)25/34

Page 26: Statistical Analysis of Genetic and Phenotypic Data for ...genomics.cimmyt.org/SAGPDB/Slides Paulino/3. GBLUP-RR/GBLUP-R… · If we have individuals for training and testing, we

Biplot from marker effects

Continue...

Why is the biplot useful?

1 Points in the biplot are the marker effects projected in the first twocomponents.

2 The “environmental effects” are displayed as vectors whose coordinatesare given by γ1 and γ2.

3 The length of the vectors approximates the variance accounted for by thespecific molecular marker and “environmental effect”.

4 The cosine of the angle between two environments, approximates thecorrelation of the two environments with an angle of zero indicating acorrelation of +1, an angle of 90◦ (or −90◦) a correlation of 0, and anangle of 180◦ a correlation of −1.

CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)26/34

Page 27: Statistical Analysis of Genetic and Phenotypic Data for ...genomics.cimmyt.org/SAGPDB/Slides Paulino/3. GBLUP-RR/GBLUP-R… · If we have individuals for training and testing, we

Biplot from marker effects

R Code

rm(list=ls())

#Set the working directorysetwd("C:/Users/P.P.RODRIGUEZ/Desktop/Slides Paulino/3. GBLUP-RR/examples")

#Function for biplotssource("biplot.R")

#Import the datadata=read.csv("mean_betas_PMBL.csv",header=TRUE)data=data[,2:5]

#Principal component analysispca.betas= princomp(data,cor=TRUE)

#Default biplot in Rbiplot(pca.betas)

#Modified functionmy.biplot.princomp(pca.betas)

CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)27/34

Page 28: Statistical Analysis of Genetic and Phenotypic Data for ...genomics.cimmyt.org/SAGPDB/Slides Paulino/3. GBLUP-RR/GBLUP-R… · If we have individuals for training and testing, we

Biplot from marker effects

Exercise

1 Use the wheat dataset described in the previous slides and fit aGBLUP-BRR model and save the marker effects for the 4 environmentsto an Excel file (csv).

2 Use the marker effects obtained in Step 1, and create a biplot.

CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)28/34

Page 29: Statistical Analysis of Genetic and Phenotypic Data for ...genomics.cimmyt.org/SAGPDB/Slides Paulino/3. GBLUP-RR/GBLUP-R… · If we have individuals for training and testing, we

Extension of BRR to include infinitesimal effect

Extension of basic model to include infinitesimal effect

de los Campos et al. (2009) extended the basic BRR model to include aninfinitesimal effect, that is:

yi = µ+

p∑j=1

xijβj + ui + ei , (8)

where u ∼ N(0, σ2uA) and A is the pedigree matrix.

The model can be fitted using the standard linear mixed model theory or usingBayesian methods.

CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)29/34

Page 30: Statistical Analysis of Genetic and Phenotypic Data for ...genomics.cimmyt.org/SAGPDB/Slides Paulino/3. GBLUP-RR/GBLUP-R… · If we have individuals for training and testing, we

Extension of BRR to include infinitesimal effect

Example

rm(list=ls())library(BGLR)data(wheat)

X=wheat.XY=wheat.YA=wheat.A

#Linear predictorETA=list(list(X=X,model="BRR"),

list(K=A,model="RKHS"))

fmR<-BGLR(y=Y[,1],ETA=ETA,nIter=10000,burnIn=5000,thin=10)plot(fmR$yHat,Y[,1])

CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)30/34

Page 31: Statistical Analysis of Genetic and Phenotypic Data for ...genomics.cimmyt.org/SAGPDB/Slides Paulino/3. GBLUP-RR/GBLUP-R… · If we have individuals for training and testing, we

Extension of BRR to include infinitesimal effect

Exercise

Suppose that we want to predict the grain Yields for individuals in set 2 andenvironment 1 using the marker and pedigree information jointly.

Write an R program to solve the problem described aboveObtain the correlations in the training setObtain the correlations in the testing setWrite the predictions to a csv (comma separated values) so that you canread the file in Excel.

CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)31/34

Page 32: Statistical Analysis of Genetic and Phenotypic Data for ...genomics.cimmyt.org/SAGPDB/Slides Paulino/3. GBLUP-RR/GBLUP-R… · If we have individuals for training and testing, we

Extension of BRR to include infinitesimal effect

Marker based vs Maker + Pedigree based model

How can we design a simulation to study prediction ability of a model withmarkers only and a model that includes Markers + Pedigree?

CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)32/34

Page 33: Statistical Analysis of Genetic and Phenotypic Data for ...genomics.cimmyt.org/SAGPDB/Slides Paulino/3. GBLUP-RR/GBLUP-R… · If we have individuals for training and testing, we

Extension of BRR to include infinitesimal effect

Concluding remarks

CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)33/34

Page 34: Statistical Analysis of Genetic and Phenotypic Data for ...genomics.cimmyt.org/SAGPDB/Slides Paulino/3. GBLUP-RR/GBLUP-R… · If we have individuals for training and testing, we

Extension of BRR to include infinitesimal effect

References

de los Campos G., J. Hickey, R. Pong-Wong, H. D. Daetwyler and M.P.L.Calus,. 2012.Whole Genome Regression and Prediction Methods Applied to Plant andAnimal Breeding,Genetics.

Gabriel, R. K. 1971.The Biplot Graphical Display of Matrices with Application to PrincipalComponent Analysis.Biometrika, 58, 453-467.

Endelman, J. B., 2011.Ridge Regression and Other Kernels for Genomic Selection with Rpackage rrBLUP,The Plant Genome 4(3): 250-255.

CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical Sessions (GBLUP-RR)34/34