GENETIC EVALUATION OF MULTI-BREED BEEF …schenkel/papers.htg/Vanerlei-thesis.pdf · GENETIC...

GENETIC EVALUATION OF MULTI-BREED BEEF CATTLE

A Thesis

Presented to

The Faculty of Graduate Studies

of

The University of Guelph

by

VANERLEI MOZAQUATRO ROSO

In partial fulfilment of requirements

for the degree of

Doctor of Philosophy

November, 2004

© Vanerlei Mozaquatro Roso, 2004

Advisory Committee: Dr. Stephen P. Miller (Advisor)

Dr. Flávio S. Schenkel

Dr. Gary J. Umphrey

Dr. James W. Wilton

Dr. Lawrence R. Schaeffer

ABSTRACT

GENETIC EVALUATION OF MULTI-BREED BEEF CATTLE

Vanerlei Mozaquatro Roso Advisor: University of Guelph, 2004 Professor Stephen Paul Miller

Three alternative methods for measuring the degree of connectedness among test

groups (TG), including variance of estimated differences between TG effects (VED),

connectedness rating (CR), and total number of direct genetic links between TG due to

common sires and dams (GLT), which could be routinely used in genetic evaluation

programs, were evaluated. Data were consecutive weights of bulls tested in central

evaluation stations in Ontario, Canada. The Prediction error variance of differences in

estimated breeding values of bulls from different TG (PEVD) was assumed the most

adequate measure of connectedness and results from VED, CR, and GLT were compared

relative to PEVD. Average PEVD of pairs of TG can be more accurately predicted on the

basis of GLT than on the basis of either VED or CR. Average PEVD of each TG with all

other test groups can be more accurately predicted on the basis of either CR or GLT.

The GLT, which is not excessively computing demanding, was used to identify a set

of connected contemporary groups including both purebred and crossbred animals from

beef herds in Ontario. Estimates of variance components, breed additive genetic changes,

direct and maternal breed, dominance, and epistatic loss genetic effects on pre-weaning

weight gain (PWG) were obtained. Both direct and maternal dominance effects were

assumed proportional to breed heterozygosity and showed favourable effects on PWG.

Direct epistatic loss reduced the performance of the animals, whereas maternal epistatic

loss did not significantly affect the PWG. Breeds ranked similarly to what was expected,

but estimates were highly unstable, with high standard errors, possibly due to

multicollinearity, which can result in inaccurate across-breed estimated breeding values.

A framework using ridge regression methods was developed to obtain more stable

estimates of direct and maternal breed, dominance, and epistatic loss effects on PWG

when multicollinearity is of concern. Two generalized methods were applied in the choice

of the ridge parameter. Once the choice of the ridge parameter was made, its reliability

and validity were evaluated through bootstrap resampling procedures. Mean squared error

of prediction (MSEP) of both ridge regression methods were 3% lower than the MSEP

from ordinary least squares. Ridge regression methods were effective in reducing the

multicollinearity involving predictor variables of breed effects.

i

ACKNOWLEDGEMENTS

I am particularly grateful to my advisor Dr. Stephen P. Miller for giving me the

opportunity to develop my graduate studies at University of Guelph. His enthusiasm,

encouragement, guidance, and friendship during my graduate program were appreciated. I

would like to extend sincere acknowledgements to the other members of my advisory

committee, Dr. Flávio. S. Schenkel, Dr. Gary J. Umphrey, Dr. James W. Wilton, and Dr.

Lawrence R. Schaeffer for their time, advice, and contributions to the manuscript. Thanks

to Dr. Peter G. Sullivan, Dr. Luiz A. Fries, and Dr. Roberto Carvalheiro for their

suggestions.

I would like to acknowledge the faculty, staff, students, and visiting scientists at the

Department of Animal and Poultry Science for their help, kindness, and support, making

my graduate program a pleasant experience.

A special thanks to my friends Flávio, Sandra, Mariana, and Daniel, who made me

feel at home during my stay in Guelph, and to my family, for their continuous support

and love.

I am thankful to my partners at GenSys Consultores Associados in Brazil, Fernanda

V. Brito, Jorge L. P. Severo, Luiz A. Fries, and Mario L. Piccoli for their extra effort to

cover my temporary leave of absence, which allowed me to pursue a Ph.D. at the

University of Guelph.

I would like to thank Beef Improvement Ontario (BIO) for providing data and

financial support, Natural Sciences and Engineering Research Council of Canada, and

Ontario Ministry of Agriculture and Food for financial support, and the Canadian

ii

Foundation for Innovation, Ontario Innovation Trust, and Compaq for supporting the

required computing infrastructure.

iii

TABLE OF CONTENTS

1. General Introduction …………………………………………………………… 1

2. Degree of connectedness among groups of centrally tested beef bulls ………. 6

Abstract …………………………………………………………………….. 6

Introduction ………………………………………………………………… 7

Material and Methods ……………………………………………………… 9

Data ………………………………………………………………… 9

Statistical model ……………………………………………………. 9

Measures of the degree of connectedness ………………………….. 10

Prediction error variance of differences in EBV of bulls ( a )

from different test groups (PEVD) ……………………………..

10

Variance of estimated differences between test groups effects

( g ) (VED) …………………………………………………….

11

Connectedness rating (CR) …………………………………….. 11

Total number of direct genetic links between test groups (GLT) 11

Results ……………………………………………………………………… 13

Connectedness ……………………………………………………… 13

Prediction of PEVD on the basis of VED, CR, and GLT ………….. 14

Average PEVD of pairs of TG ………………………………… 16

On the basis of VED ……………………………………... 16

On the basis of CR ……………………………………….. 17

iv

On the basis of GLT………………………………………. 17

Average PEVD of each TG with all other TG ………………... 18

On the basis of VED ……………………………………... 18

On the basis of CR ……………………………………….. 19

On the basis of GLT ……………………………………… 19

Simulation of disconnected test groups ……………………………. 21

Discussion ………………………………………………………………….. 22

Conclusions ………………………………………………………….……... 26

3. Additive, dominance, and epistatic loss effects on pre-weaning gain in

crossing of different Bos taurus breeds ………………………………………..

36

Abstract …………………………………………………………………….. 36

Introduction ………………………………………………………………… 38


Data ………………………………………………………………….. 39

Connectedness analysis ……………………………………………... 40

Predictor variables of fixed genetic effects …………………………. 40

Breed additive effects ………………………………………….. 40

Dominance effects ……………………………………………... 41

Epistatic loss effects ……….…………………………………... 42

Genetic analysis ……………………………………………………... 43

Multi-breed additive genetic changes …………………………….…. 44

Results ……………………………………………………………………… 45

v

(Co)variance components …………………………………………… 45

Multi-breed additive genetic changes ……………………………... 46

Dominance and epistatic loss effects …….………………………….. 47

Breed additive effects …………….…………………………………. 47

Sampling correlations ……………………………………………….. 48

Discussion ………………………………………………………………….. 48

Conclusions ………………………………………………………….……... 54

4. Estimation of genetic effects in the presence of multicollinearity …………… 69

Abstract …………………………………………………………………….. 69

Introduction ………………………………………………………………… 71


Data ………………………………………………………………….. 72

Predictor variables of fixed genetic effects …………………………. 73

Breed additive effects ………………………………………….. 73

Dominance effects ……………………………………………... 73

Epistatic loss effects ……………………………………….…... 74

Multicollinearity diagnostics ………………………………………... 74

Variance inflation factor ………………………………………. 74

Condition index ………………………………………………... 75

Variance-decomposition proportions associated with the

eigenvalues ...…………………………………………………...

76

Genetic analysis ……………………………………………………... 77

vi

Ridge regression …………………………………………………….. 79

Objective methods for selecting the ridge parameter K …………….. 80

Generalized Ridge Estimator of Hoerl and Kennard (R1) …….. 80

Bootstrap in combination with cross-validation (R2) …………. 82

Mean squared error of prediction and variance inflation factor …….. 83

Bias measurement …………………………………………………… 84

Comparison of across-breed estimated breeding values …………….. 84

Additive-dominance models …………………………………… 84

Additive-dominance-epistatic models …………………………. 85

Results ……………………………………………………………………… 86

Multicollinearity diagnostics ……………………………………….. 86

Ridge parameter K …………………………………………………. 87

Convergence of estimates of fixed genetic effects …………………. 88

Mean squared error of prediction and variance inflation factor ….… 88

Bias measurement ………………………………………………….. 89

Dominance and epistatic loss effects ………..……………………... 90

Breed additive effects …………….………………………………… 90

Sampling correlations ……………………………………………… 92

Comparison of across-breed estimated breeding values …………… 93

Use of the same ridge parameter in subsequent genetic evaluations 95

Discussion ………………………………………………………………….. 96

Conclusions ………………………………………………………………… 102

vii

5. General Discussion ……………………………………………………………… 124

Degree of connectedness among test groups of centrally tested beef

bulls ………………………………………………………………………….

125

Practical implications ………………………………………………….. 127

Limitations and suggestions for further investigations ………………... 128

Additive, dominance, and epistatic loss effects on pre-weaning gain in

crossing of different Bos taurus breeds ……………………………………

129

Practical implications ………………………………………………….. 131

Limitations and suggestions for further investigations ………………... 132

Estimation of genetic effects in the presence of multicollinearity ………. 134

Practical implications ………………………………………………… 136

Limitations and suggestions for further investigations ………………. 137

6. References …………………………………………………………………... 138

viii

LIST OF TABLES

Table 2.1. Summary of the bull test data ……………...……………...…………….… 27

Table 2.2. Correlations among PEV of the difference between EBV of bulls from

different test groups (PEVD), variance of estimated differences between

test group effects (VED), connectedness rating (CR) and total number of

direct genetic links between test groups (GLT) for pairs of test groups

(above diagonal) and for averages of each test group with all other test

groups (bellow diagonal) ………………………………………………...

28

Table 2.3. Estimates of intercept, regression coefficients, and coefficient of

determination (R2) of the models to predict average PEVD of pairs of test

groups ……………………………………………………………………..

29


determination (R2) of the models to predict average PEVD of each test

group with all other test groups …………………………………………...

30

Table 3.1. Coefficients of direct (HD) and maternal (HM) dominance and direct (ED)

and maternal (EM) epistatic loss genetic effects for different mating

systems involving two breeds, A and B …………………………………..

56

ix

Table 3.2. Distribution of observations among coefficients of direct (HD) and

maternal (HM) dominance and direct (ED) and maternal (EM) epistatic loss

genetic effects …………………………………………………………….

57

Table 3.3. Mean and standard deviation (SD) of pre-weaning gain (Gain), weaning

age (Age), coefficients of direct and maternal breed additive, dominance

(HD and HM), and epistatic loss (ED and EM) genetic effects ……………..

58

Table 3.4. Estimates of (co)variance components and genetic parameters of pre-

weaning gain (kg) …………………………………………………………

59

Table 3.5. Multi-breed additive genetic changes in pre-weaning gain per year

obtained through regression of average estimated breeding values of

purebred calves on birth year (Average) and through regression of

estimated breeding values on contribution of each breed to the breed

composition of the calves (Regression) …………………………………..

60

Table 3.6. Estimates and standard errors of direct and maternal dominance (H) and

epistatic loss (E) effects on pre-weaning gain (kg) ……………………….

61

Table 3.7. Estimates (as deviations from Angus) and standard errors of direct and

maternal breed additive effects for pre-weaning gain (kg) ……………….

62

x

Table 3.8. Sampling correlations among estimates of direct (D) and maternal (M)

fixed genetic effects ...…………………………………………………….

63

Table 4.1. Correlation coefficients among predictor variables of direct (D) and

maternal (M) fixed genetic effects (n = 478,466) ………………………...

103

Table 4.2. Eigenvalues of the correlation matrix among predictor variables of fixed

genetic effects and corresponding condition indices ……………………..

105

Table 4.3. Decomposition of the variance structure of the parameter estimates

associated with the two largest condition indices ………………………...

106

Table 4.4. Values of the ridge parameter (K) obtained by ridge regression methods

R1 and R2, for direct and maternal genetic effects ……………………….

107

Table 4.5. Summary of results obtained over one hundred bootstrap samples for

ordinary least squares (LS) and ridge regression methods R1 and R2 …...

108

Table 4.6. Estimates of direct and maternal dominance (H) and epistatic loss (E)

effects on pre-weaning gain (kg), obtained by ordinary least squares (LS)

and ridge regression methods R1 and R2 …………………………………

109

Table 4.7. Estimates of direct and maternal breed additive effects on pre-weaning

xi

gain (kg), as deviations from Angus, obtained by ordinary least squares

(LS) and ridge regression methods R1 and R2 …………………………...

110

Table 4.8. Number of calves including records from 1986 to the indicated year,

expressed as equivalent purebred calves ………………………………….

111

Table 4.9. Values of the ridge parameter (K), obtained by ridge regression methods

R1 and R2, using records from 1986 to 1996 …………………………….

112

xii

LIST OF FIGURES

Figure 2.1. Average degree of connectedness for pairs of test groups (top) and for

each test group with all other test groups (bottom) on the basis of

PEVD, VED, CR, and GLT ……………………………………………..

31

Figure 2.2. Observed relationship of average PEVD per test group with number of

bulls per test group, average PEVD per test group with number of sires

per test group, CR with number of bulls per test group, and GLT with

number of bulls per test group …………………………………………...

32

Figure 2.3. Observed relationship of average PEVD of pairs of test groups with

VED, CR, and GLT .……………………………………………………..

33

Figure 2.4. Observed relationship of average PEVD of each test group with VED,

CR, and GLT ………………………………………………….…………

34

Figure 2.5. Observed relationship of average PEVD of each test group with number

of bulls per test group, VED, CR, and GLT for connected and

disconnected test groups …………………………………………………

35

Figure 3.1. Percentage of calves, sires, and dams with 1, 2, 3, or 4 breeds in the

genetic composition in the dataset containing 478,466 calves, 19,908

xiii

sires, and 234,608 dams ………………………………………………… 65

Figure 3.2. Number of purebred and crossbred calves, sires, and dams containing

some portion of the indicated breed in dataset including 478,466 calves,

19,908 sires, and 234,608 dams …………………………………………

66

Figure 3.3. Numbers of purebred and crossbred (expressed as equivalent to

purebred) calves per breed ………………………………………………

67

Figure 3.4. Multi-breed additive genetic changes in pre-weaning gain obtained

through average breeding values of purebred calves per birth year

(Average) and through regression of yearly breeding values on

contribution of each breed to the breed composition of the calves

(Regression) …..………………………………………………………….

68

Figure 4.1. Variance inflation factor (VIF) associated with predictor variables of

direct and maternal dominance (H), epistatic loss (E), and breed additive

effects ……………………………………………………… …………...

113

Figure 4.2. Convergence of the estimates of direct and maternal dominance (H),

epistatic loss (E), and breed additive effects under ridge regression

method R1 ….…...……………………………………………………….

114

xiv

Figure 4.3. Convergence of the estimates of direct and maternal dominance (H),

epistatic loss (E), and breed additive effects under ridge regression

method R2 ……………………………………………………………….

115


direct and maternal dominance (H), epistatic loss (E), and breed additive

effects under ordinary least squares (LS) and ridge regressions methods

R1 and R2 …………………………………………………….………….

116

Figure 4.5. Estimates (as deviations from AN) and standard errors of direct

dominance (H), epistatic loss (E), and breed additive effects under

ordinary least squares (LS) and ridge regression methods R1 and R2 …..

117

Figure 4.6. Estimates (as deviations from AN) and standard errors of maternal

dominance (H), epistatic loss (E), and breed additive effects under


118

Figure 4.7. Sampling correlations (multiplied by –1.0) between estimates of

maternal dominance (HM) and direct epistatic loss (ED) effects and

between estimates of direct and maternal breed additive effects given by


119

Figure 4.8. Pearson and Spearman correlations, and percentages of coincidence for

xv

different proportions of selected (top 1%, 10%, 20%, and 40%) sires,

dams, and calves on the basis of ABC yielded by different models

compared to model ADE-R2 …………………………………………….

120

Figure 4.9. Estimates of direct and maternal dominance (H), epistatic loss (E), and

breed additive effects (as deviations from AN), under ordinary least

squares, using records from 1986 to the indicated year …………………

121


breed additive effects (as deviations from AN), under ridge regression

method R1, using records from 1986 to the indicated year (ridge

parameter K was obtained using records from 1986 to 1996) …………..

122


breed additive effects (as deviations from AN), under ridge regression

method R2, using records from 1986 to the indicated year (ridge

parameter K was obtained using records from 1986 to 1996) …………..

123

xvi

ABREVIATIONS KEY

ABC = Across-breed estimated breeding value

AN = Angus

BD = Blond D’Aquitane

BEG = Bull estimated weight gain

BLUP = Best linear unbiased predictor

CH = Charolais

CI = Condition index

CR = Connectedness rating

D = Dominance effect

E = Epistatic loss effect

EBV = Estimated breeding value

ED = Coefficient of direct epistatic loss effect

EM = Coefficient of maternal epistatic loss effect

GLT = Total number of direct genetic links between test groups

GV = Gelbvieh

HD = Coefficient of direct dominance effect

HE = Hereford

HM = Coefficient of maternal dominance effect

LM = Limousin

LS = Ordinary least squares

MA = Maine-Anjou

MSE = Mean square error

xvii

MSEP = Mean squared error of prediction

PEV = Prediction error variance

PEVD = Average prediction error variance of the difference between EBVs

R1 = Generalized ridge estimator of Hoerl and Kennard (ridge regression method R1)

R2 = Bootstrap in combination with cross-validation (ridge regression method R2)

SA = Salers

SH = Shorthorn

SM = Simmental

TG = Test group

VED = Variance of estimated differences between test group effects

VIF = Variance inflation factor

1

Chapter 1

General Introduction

Genetic selection and planned crossbreeding systems are two complementary

strategies that have been applied in the beef cattle industry to generate animals with high

levels of production and efficiency under varying management conditions and market

preferences. Programs of genetic improvement taking advantage of between-breed

additive and non-additive genetic effects are now common worldwide.

A genetic goal is effectively accomplished by selection based on modern genetic

evaluation. Considering the importance of crossbreeding in beef cattle production, genetic

evaluation must consider animals of multiple breeds. Mixed model procedures,

employing an animal model, are generally used in the genetic evaluations of multi-breed

populations. For having highly accurate genetic evaluations and consequently high

response to selection, breed additive and non-additive genetic effects must be properly

accounted for. Moreover, estimated breeding values of animals should be comparable

regardless of the breed composition and management units from which they come.

The present research focuses on some problems related to statistical methods applied

to the estimation of breeding values of animals in a multi-breed population of beef cattle,

more specifically:

(1) Estimation of the degree of connectedness among groups of centrally tested beef

bulls;

2

(2) Estimation of additive, dominance, and epistatic loss effects on pre-weaning gain

in crossing of different Bos taurus breeds; and

(3) Estimation of genetic effects in the presence of multicollinearity.

Central testing of beef bulls is an important component of genetic improvement

programs for beef cattle in many countries. Because selection is carried out across test

groups, evaluation of the degree of connectedness among test groups is of great concern.

With few genetic links between test groups, comparison of bulls’ EBV from different

groups is less accurate, even if the accuracy of the EBV are high within the groups

(Kennedy and Trus, 1993).

Different criteria for measuring connectedness have been proposed in the literature

(e.g., Wood et al., 1991; Folley et al., 1992; Laloë, 1993; Kennedy and Trus, 1993; Fries,

1998; Hanocq and Boichard, 1999; Mathur et al., 2002). Ideally, PEV of comparisons

between animals or average PEV of comparisons between groups of animals (PEVD),

which is influenced by the average genetic relationship between and within management

units, should be the basis for measuring connectedness (Kennedy and Trus, 1993).

However, computing the PEV matrix is very difficult or impossible for large datasets. If

obtaining a measure of connectedness through PEVD is impossible, alternative methods

could be used to predict PEVD. In Chapter 2, three alternative methods are assessed and

compared with respect to prediction of PEVD. Models to predict PEVD, which could be

routinely used in genetic evaluation, are suggested. An indication of the degree of

connectedness among test groups of beef bulls in Ontario, Canada, is obtained. Results

from this investigation will be the basis for developing recommendations to increase the

accuracy of comparisons of bulls across test groups.

3

Across herd genetic evaluations for growth traits is another significant component of

genetic improvement programs for beef cattle in many countries. Similar to genetic

evaluations of centrally tested beef bulls, across herd genetic evaluations for growth traits

are based on additive-dominance genetic models. These models are justified based on the

assumption that heterosis is mainly due to dominance effects, in agreement with results

obtained by Gregory et al. (1997) in a large beef cattle crossbreeding experiment.

Heterosis is modeled as being proportional to the probability that genes at a locus come

from different breeds, which corresponds to the breed heterozygosity. Deviations from

the linear association of heterosis with degree of heterozygosity are due to recombination

loss (Dickerson, 1969, 1973). Recombination loss (epistatic loss) is attributed to the loss

of favourable epistatic combinations present in the gametes from purebreds as a result of

long-term selection. This loss is proportional to the probability that two non-allelic genes

randomly chosen in the individual are from different breeds. Because it is difficult to

estimate dominance and epistatic loss effects separately, research studies to estimate both

dominance and epistatic loss effects in beef cattle are not abundant, particularly with field

data. However, results obtained by Arthur et al. (1999) suggest that, when data structure

allows, the inclusion of epistatic effects in the genetic evaluation model can significantly

improve the accuracy of predictions.

Estimates of (co)variance components, heterosis, breed effects, and additive genetic

changes have been obtained in Ontario (Miller, 1996; Sullivan et al., 1999), but there

were no available studies which separated direct and maternal dominance and epistatic

loss effects associated with breed heterozygosities. An objective reported in Chapter 3

was to obtain estimates of direct and maternal breed additive, dominance, and epistatic

loss effects for pre-weaning gain weight. (Co)variance components were also obtained

4

and breed additive genetic changes between 1986 and 1999 were examined. Estimates

obtained in this study can be used to update the parameters currently used in the genetic

evaluations to improve accuracy.

For fitting breed additive, dominance, and epistatic loss effects, a multiple regression

equation including predictor variables such as breed compositions and breed

heterozygosities, and functions of the heterozygosities can be used. This has been

generally done by ordinary least squares methods. The interpretation of the estimates

given by ordinary least squares depends on the assumption that predictor variables are not

strongly interrelated. If the vectors of predictor variables are multicollinear, the least

square estimates typically have large standard errors, may have signs that are opposite to

what would be expected, and are sensitive to changes in the data file and to addition or

deletion of variables in the model, making modeling very confusing. Moreover, when

taken in combination, the estimated coefficients often cancel out, indicating confounding.

In the presence of multicollinearity, the least squares estimator is not adequate because it

will be very unstable. Multicollinearity has been indicated as one of the main causes of

unexpected signs and high degree of confounding involving estimates of direct and

maternal breed additive and/or non-additive genetic effects (e.g., Kinghorn and Vercoe,

1989; Rodríguez-Almeida et al., 1997; Fries et al., 2000; Cassady et al., 2002), which can

lead to the incorrect ranking of animals based on across breed comparisons.

For overcoming difficulties caused by multicollinearity, Hoerl and Kennard (1970a,

1970b) suggested the use of the ridge regression estimator. With a suitable choice of the

ridge parameter, the ridge regression estimator gives a more precise estimate of

regression coefficients because its variance and mean squared error are smaller than those

of the least squares estimator. The fact that ridge regression estimators have been

5

successfully applied in dealing with multicollinearity in diverse fields, including

Chemistry, Econometrics, and Engineering (Gruber, 1998) suggests avenues for research

and application in the context of animal breeding, particularly in the analysis of multi-

breed populations of beef cattle.

Chapter 4 presents the development of a framework, using ridge regression methods,

for obtaining stable estimates of direct and maternal breed additive, dominance, and

epistatic loss effects on pre-weaning gain when multicollinearity is of concern, which

could contribute to more accurate multi-breed genetic evaluation of beef cattle. After

identifying the causes of dependencies among predictor variables, two generalized ridge

regression methods were applied in the choice of the ridge parameter. Once the choice of

the ridge parameter was made, its reliability and validity were evaluated through

bootstrap resampling procedures in combination with cross-validation. Finally, some

results obtained with ridge regression methods were examined to further illustrate

application of ridge regression in routine large-scale genetic evaluations.

The final chapter is a general discussion of results obtained in the previous chapters.

Some practical implications of the results of this study, limitations, and suggestions for

future research are presented.

6

Chapter 2

Degree of connectedness among groups of

centrally tested beef bulls

V. M. Roso, F. S. Schenkel, and S. P. Miller

Published in Canadian Journal of Animal Science 2004 84: 37-47

Reproduced by permission of the Agricultural Institute of Canada

ABSTRACT - The degree of connectedness among test groups (TG) of bulls tested in

central evaluation stations from 1988 to 2000 in Ontario, Canada, was evaluated using the

methods PEVD, VED, CR, and GLT. The model used in the analysis included the effects

of breed and TG (fixed) and animal (random). PEVD was assumed the most adequate

measure of connectedness and results from the alternative methods VED, CR, and GLT

were compared relative to PEVD. Models to predict the average PEVD of pairs of TG

and the average PEVD of each TG with all other TG on the basis of VED, CR, and GLT

were developed. Results from all measures of connectedness indicated an unfavourable

trend in the degree of connectedness after 1994. The average PEVD of pairs of TG can be

7

better predicted on the basis of the model that includes GLT. The average PEVD of each

TG with all other TG can be better predicted on the basis of models that include either CR

or GLT. Connectedness among TG of centrally tested beef bulls can be adequately

assessed for specific pairs of TG or overall for each TG with all other TG using GLT.

Key words: accuracy, central test, genetic evaluation, harmonic mean

Abbreviations: BEG, bull estimated weight gain; CR, connectedness rating; EBV,

estimated breeding value; VED, variance of estimated differences between test group

effects; GLT, total number of direct genetic links between test groups; PEV, prediction

error variance; PEVD, average prediction error variance of the difference between

estimated breeding values; TG, test group.

INTRODUCTION

Connectedness among test groups (TG) is of interest in genetic evaluation of station-

tested beef bulls because comparisons of estimated breeding values (EBV) of bulls tested

in different groups are made. The EBV of bulls from different TG are comparable due to

use of appropriate methodology (Best Linear Unbiased Predictor, BLUP) and genetic

connectedness among groups. However, the accuracy of the comparisons depends upon

the degree of connectedness among TG. With lower connectedness between TG,

comparison of bulls’ EBV from different TG is less accurate, even if the accuracy of EBV

is high within the groups (Kennedy and Trus, 1993).

When genetic evaluation is under an animal model, connections occur through

additive genetic relationships. Hence, two TG could be connected by direct and/or

8

indirect genetic links. Kennedy and Trus (1993) argued that the most appropriate measure

of connectedness is the average prediction error variance of differences (PEVD) in EBV

between animals in different management units (e.g., TG), which is influenced by the

average genetic relationship between and within management units. However, computing

this statistic is extremely time consuming and not feasible for routine application.

When PEVD cannot be computed, Kennedy and Trus (1993) proposed to use the

variance of estimated differences between management unit effects (VED), which was

highly correlated with PEVD in their simulation study. Mathur et al. (1999) also

suggested that VED could be used as a measure of connectedness between two

management units and proposed to calculate the connectedness rating (CR), defined as

the correlation between estimated effects of two management units. Following Mathur et

al. (1999), CR is less dependent on the size and structure of management units than VED.

For calculating CR, the authors proposed an iterative method, which captures the inverse

elements for some rows and columns (corresponding to TG in the mixed model

equations, for example) of any large matrix for which a direct inverse is not possible.

Fries (1998) proposed the use of number of direct genetic links between TG (GLT) due to

common sires and dams as a method for measuring degree of connectedness among TG.

The objectives of this study were:

(1) To obtain an indication of the degree of connectedness of test groups of beef bulls

in Ontario,

(2) To assess and compare the methods VED, CR and GLT for measuring the degree

of connectedness among groups of station-tested beef bulls, and

9

(3) To define a model to predict the PEVD of pairs of test groups and the average

PEVD of each TG with all other TG, which could be routinely used in genetic evaluation

programs.

MATERIAL AND METHODS

Data

Data were consecutive weights of bulls tested in central evaluation stations in

Ontario, Canada, from 1988 to 2000. Bulls from multiple breeds and crossbreds, from

different herds, were delivered to test stations and submitted to an adjustment period of

28 days before start of test. Bulls were weighed every 28 days during a period of 112 or

140 days on test. A summary of the data is presented in Table 2.1.

Statistical model

Consecutive weights of bulls were used to obtain the estimated weight gain (BEG). A

fixed univariate linear regression of the weight (wij) on days on test (dij) for each bull i

was estimated, using the model wij = αi + βidij + eij, where αi and βi are the intercept and

linear regression coefficient of the ith bull, respectively, and eij is the random residual

term. The BEG was calculated multiplying βi by the number of days on test (140 days)

and adjusted for heterosis on the basis of individual bull’s heterozygosity. An ad hoc

heterosis of 3% was assumed for an animal with heterozygosity of 100%, regardless of

the breeds involved (Sullivan et al., 1999). Then, BEG was used as an observation in the

follow genetic evaluation model:

10

ijijjik

14

1=kkij e+a+g+Bb=BEG ,

where

BEGij is the estimated weight gain of the ith bull in the jth TG;

bk is the linear regression coefficient on the breed composition for the kth

breed;

Bik is the contribution of the kth breed to the breed composition of the ith bull;

gj is the fixed effect of the jth TG;

aij is the random additive genetic effect of the ith

bull in the jth TG;

eij is the random residual effect.

Random effects a and e were assumed independent with covariance matrices equal to

Aσ2a and Iσ2

e, respectively. All available pedigree information was incorporated into the

additive numerator relationship matrix A. The required elements for calculating VED, CR

and PEVD were obtained using PEST (Groeneveld, 1990), assuming ad hoc heritability

of 0.43 (Sullivan et al., 1999), which was previously estimated for the same data set.

Measures of the degree of connectedness

The degree of connectedness among TG was measured using the following methods:

(1) Prediction error variance of differences in EBV of bulls ( a ) from different test

groups (PEVD). The PEVD of two animals, one from the ith and other from the jth TG

was given by

)a–a,a–a(cov2–)a–a(var+)a–a(var=PEVD jjiijjiiij .

11

(2) Variance of estimated differences between test group effects ( g ) (VED). The

VED between the ith and the jth TG was given by

)g,g(cov2–)g(var+)g(var=VED jijiij .

(3) Connectedness rating (CR), defined as the correlation between estimated effects

of TG (Mathur et al. 2002). The CR between the ith and the jth TG was given by

100×)g(var)g(var

)g,g(cov=CR

ji

jiij .

(4) Total number of direct genetic links between test groups (GLT), defined as the

links between TG due to common sires and dams (Fries, 1998). The basic steps of the

algorithm and the criteria used for computing GLT are:

1. Calculate the number of direct genetic links between pairs of TG due to common

sires and dams. Then, for each TG, calculate the overall number of genetic links

due to sires (GLs) and dams (GLd) with all other TG.

2. Calculate the total number of genetic links (GLT) as the sum of GLs and GLd.

3. Identify the TG with the largest GLT (“main TG”).

4. Identify all TG direct and/or indirectly connected to “main TG”. These groups

constitute the “principal mass”. TG with less than 10 GLT and/or less than three

different parents (sires + dams) were considered disconnected to “principal mass”

and have their GLT zeroed. Other criteria could be used.

5. Repeat step 4 until the connected TG remain the same as at previous run.

12

6. Save records that were considered as connected to the “principal mass”. TG

disconnected to the “principal mass” have GLT equal to zero and should be rerun

through the program. This procedure allows identification of isolated subsets of

connected TG.

The average PEVD was assumed as the basic measure of connectedness of a TG,

following Kennedy and Trus (1993). This statistic was considered the most appropriate

measure of connectedness and the alternative methods VED, CR, and GLT were

compared relative to PEVD. The degree of connectedness was calculated for pairs of TG

and for each TG with all other TG. Connectedness between pairs of TG indicates

accuracy in comparing EBV of animals from two TG. Average connectedness of each TG

with all others indicates the average accuracy in comparing EBV of an animal with

animals in all other TG. This measure is of greater importance in the evaluation of beef

bulls in a station test because selection generally considers all TG instead of a few very

well connected TG. High average connectedness of each TG with all other TG allows

effective selection across all TG.

As previously indicated, the GLT of a TG is the number of direct genetic links of the

TG with all other TG. Obviously many pairs of TG that do not have any direct genetic

links are indirectly connected and, consequently, can have high accuracy of comparisons

of EBV between them. For this reason, the number of direct genetic links between pairs

of TG is inadequate to indicate the degree of connectedness between pairs of TG. The

arithmetic mean of GLT of each pair of TG is also inadequate because pairs of TG with

equal arithmetic mean can have very different degrees of connectedness. A potentially

adequate measure of connectedness between pairs of TG could be obtained through the

harmonic mean of the GLT. This measure has the property of discriminating among pairs

13

of TG with different GLT, penalizing those expected to be more poorly connected. As a

consequence, better relationship between PEVD with harmonic means than with

arithmetic means of GLT may be expected. The harmonic mean of GLT was used in the

prediction of average PEVD of pairs of TG. The harmonic mean of GLT of TG i and TG j

(GLTij) was given by

ji

ij

GLT1

+GLT

12

=GLT ,

where GLTi and GLTj are the GLT of the ith and jth TG with all other TG, respectively.

The harmonic mean is always smaller than the arithmetic mean unless the GLT of the

two TG are identical. When the GLT of a TG was equal to zero, which means the TG is

not connected to the “principal mass”, a harmonic mean equal to zero was assumed.

The statistical analyses to define the models for predicting PEVD were performed

using the general linear models procedure (GLM) of the SAS statistical software (SAS

Institute Inc., 1990). The R2 of the models and the level of significance (P < 0.05) of each

effect considered were the criteria used to determine the final models. When segmented

polynomial regressions were used, the knots (junction points between segments) were

determined based on maximization of R2 of the model.

RESULTS

Connectedness

The average value of degree of connectedness among TG using PEVD and the

alternative measures VED, CR, and GLT were 1599 ± 58, 286 ± 132, 1.23 ± 1.28, and

707 ± 503 for pairs of TG and 1726 ± 41, 286 ± 93, 1.21 ± 0.51 and 709 ± 690 for each

14

TG with all other TG, respectively. The overall results over the years are depicted in

Figure 2.1. Small values of PEVD and VED, and large values of CR and GLT are

desirable, because they indicate higher levels of connectedness among TG. All measures

of connectedness showed the same trend, that is, an increase in the degree of

connectedness from 1988 to 1994 and a substantial decrease after 1994. The highest

PEVD and VED, and the smallest CR and GLT were observed in 2000 (last year with

available information at the time of this research).

Prediction of PEVD on the basis of VED, CR, and GLT

Correlations among PEVD, VED, CR and GLT for pairs of TG and for averages of

TG with all other TG are presented in Table 2.2. In general the correlations had moderate

to high magnitude. The correlation between PEVD and VED was 0.71 both for pairs of

TG and average per TG, in contrast with the almost perfect correlation obtained by

Kennedy and Trus (1993) in their simulation study. The coefficient of correlation

measures only the strength of the linear relationship between two variables. Because a

better indication of the true relationship of PEVD with the alternative methods was

needed for defining the models to predict PEVD, the observed relationship between

PEVD and the other variables were graphically analyzed. The relationship of PEVD with

both number of bulls and number of sires per TG was also analyzed.

As shown in Figure 2.2, the observed relationship of average PEVD per TG with both

number of bulls and number of sires per TG had the same pattern. By observation, TG

with more than approximately 40 bulls or 20 sires were associated with values of PEVD

smaller than 1750, otherwise TG showed large variation in PEVD. The variation depends

on the genetic relationship between groups, which is not a direct function of number of

15

bulls or number of sires per TG. Because TG with a small number of bulls or a small

number of sires showed large variation in PEVD, which indicate large variation in the

degree of connectedness of these groups, neither number of bulls (size of the group) nor

number of sires per TG are good predictors of the degree of connectedness between TG.

Figure 2.2 shows also the relationship of both CR and GLT with number of bulls per

TG. Although a large variation in the degree of connectedness was indicated by PEVD

when the size of TG was small, CR was strongly associated with number of bulls per TG

over the whole range of TG size. CR decreased linearly when the size of TG became

smaller than approximately 40 bulls. Mathur et al. (2002) reported a similar trend in the

application of CR for measuring connectedness in the Canadian Centre for Swine

Improvement. VED seemed to be even more dependent on the size of TG than CR, where

TG with less than 25 bulls were associated with increasingly higher VED (data not

shown). On the contrary, GLT showed large variation across the range of TG sizes

(Figure 2.2). Even small TG had large GLT, which could result in these TG having high

accuracy of comparisons.

The observed relationships of PEVD with VED, CR and GLT are depicted in Figure

2.3 for pairs of TG and in Figure 2.4 for the average of each TG with all other TG. In

both cases, PEVD and VED were linearly, but not strongly, associated. On the other

hand, the relationships of both CR and GLT with PEVD were curvilinear. When GLT of

pairs of TG were represented by their arithmetic mean, large variation in PEVD was

observed at the same level of GLT (Figure 2.3). However, when GLT of pairs of TG were

represented by their harmonic mean, a stronger relationship with PEVD was observed.

Therefore, in the prediction of PEVD of pairs of TG, superior results can be expected

using harmonic mean instead of arithmetic mean of GLT. Figures 2.3 and 2.4 also

16

indicate that averages of CR smaller than approximately one and GLT smaller than

approximately 250 per TG were associated with increasingly higher PEVD.

The information provided by the correlations and graphical analyses were explored to

define the models for predicting PEVD. Initially, VED, CR, GLT, number of bulls per

TG, number of sires per TG, and the ratio of number of bulls per sire per TG were

considered. In the final models, however, only those with significant effect (P < 0.05)

were kept.

The final models to predict the average PEVD of pairs of TG and the average PEVD

of each TG with all other TG based on VED, CR, and GLT were the following:

(1) Average PEVD of pairs of TG

(1a) On the basis of VED

The observed average PEVD of pairs of TG was modeled by a linear regression on

VED and a quadratic regression on the ratio of harmonic means of number of bulls and

number of sires of pairs of TG.

PEVDij = α + β1 VEDij + β2 (NB/S)ij + β3 (NB/S)ij2 + eij,

where

PEVDij is the observation of the average PEV of the difference between EBV of bulls

in the ith TG with EBV of bulls in the jth TG;

α is the intercept;

VEDij is the variance of estimated differences between the ith and the jth TG;

(NB/S)ij is the ratio of harmonic means of number of bulls and number of sires in the

ith and jth TG;

17

β1, β2 and β3 are the regression coefficients;

eij is the residual associated with PEVD of the ith and jth TG.

(1b) On the basis of CR

The observed average PEVD of pairs of TG was modeled using a quadratic-quadratic

polynomial regression on CR and a quadratic regression on the ratio of harmonic means

of number of bulls and number of sires of pairs of TG.

PEVDij = α + β1 CRij + β2 CRij2 + β3 Z + β4 (NB/S)ij + β5 (NB/S)ij

2 + eij,

where


CRij is the connectedness rating between the ith and the jth TG;

Z = 0 if CR < 1.9 or Z = (CR – 1.9)2 otherwise;


ith and jth TG;

β1, β2, β3, β4, β4, and β5 are the regression coefficients;


(1c) On the basis of GLT

The observed average PEVD of pairs of TG was modeled using a quadratic-quadratic

polynomial regression on the harmonic mean of GLT of pairs of TG and a quadratic

regression on the ratio of harmonic means of number of bulls and number of sires of pairs

of TG.

PEVDij = α + β1 GLT ij + β2 GLT ij2 + β3 Z + β4 (NB/S)ij + β5 (NB/S)ij

2 + eij,

18

where


GLT ij is the harmonic mean of the GLT of the ith and the jth TG;

Z = 0 if GLT < 550 or Z = (GLT – 550)2 otherwise;


ith and jth TG;

β1, β2, β3, β4, β4, and β5 are the regression coefficients;


(2) Average PEVD of each TG with all other TG

(2a) On the basis of VED

The observed average PEVD of each TG with all other TG was modeled by a linear

regression on VED and a quadratic regression on number of sires per TG.

PEVdi = α + β1 VEDi + β2 S + β3 S2 + ei,

where


PEVDi is the observation of the average PEV of the difference between EBV of bulls

in the ith TG with EBV of bulls in all other TG;

VEDi is the average variance of estimated differences between the ith TG and all other

TG;

S is the number of sires represented in the ith TG;

β1, β2 and β3 are the regression coefficients;

ei is the residual associated with PEVD of the ith TG.

19

(2b) On the basis of CR

The observed average PEVD of each TG with all other TG was modeled using a

quadratic-quadratic polynomial regression on CR, a quadratic regression on number of

sires, and a quadratic regression on the ratio of number of bulls per sire.

PEVDi = α + β1 CRi + β2 CRi2 + β3 Z + β4 Si + β5 Si

2 + β6 (NB/S)i + β7 (NB/S)i2 + ei,

where


CRi is the average connectedness rating of the ith TG with all other TG;

Z = 0 if CR < 1.15 or Z = (CR – 1.15)2 otherwise;


(NB/S)i is the average ratio of number of bulls per sire represented in the ith TG;

β1, β2 ,β3 , β4 , β5, β6 and β7 are the regression coefficients;


(2c) On the basis of GLT

The observed average PEVD of each TG with all other TG was modeled using a

quadratic-quadratic-quadratic polynomial regression on GLT, a linear regression on

number of sires and a quadratic regression on the ratio of number of bulls per sire.

PEVDi = α + β1 GLT i + β2 GLT i2 + β3 Z1 + β4 Z2 + β5 Si + β6 (NB/S)i + β7 (NB/S)i

2 + ei,

where


GLT i is the total number of direct genetic links between the ith TG and all other TG;

20

Z1 = 0 if GLT < 200 or Z1 = (GLT – 200)2 otherwise;

Z2 = 0 if GLT < 800 or Z2 = (GLT – 800)2 otherwise.


(NB/S)i is the average ratio of number of bulls per sire represented in the ith TG;

β1, β2 ,β3 , β4 , β5, β6 and β7 are the regression coefficients;


Estimates of parameters and coefficient of determination (R2) of the models are

presented in Table 2.3 for prediction of average PEVD of pairs of TG and in Table 2.4 for

prediction of average PEVD of each TG with all other TG. The R2 of the models to

predict average PEVD of each TG were higher than the R2 of the models to predict PEVD

of pairs of TG on the basis of VED, CR and GLT. These results were expected because

extreme values observed in the pairwise comparisons were averaged out, reducing the

variation on PEVD.

The R2 of the model to predict average PEVD of pairs of TG on the basis of VED was

equal to 0.53 and VED accounted for 51% (partial R2) of the total variation in PEVD. In

the model to predict average PEVD of pairs of TG on the basis of CR, the R2 was equal to

0.50 and CR accounted for 49% of total variation in PEVD. R2 of 0.72 was obtained in

the model that considered GLT, which accounted for 71% of the total variation in average

PEVD (Table 2.3).

In the models to predict average PEVD of each TG with all other TG, the R2 of the

model based on VED was equal to 0.55 and VED accounted for 50% of the total variation

in PEVD. In the model to predict PEVD on the basis of CR, the R2 was equal to 0.82 and

21

CR accounted for 73% of total variation in PEVD. R2 of 0.79 was obtained in the model

that considered GLT, which accounted for 76% of the total variation in PEVD (Table

2.4). The R2 increased to 0.82 when GLT also included the genetic links due to

grandparents (data not shown).

Simulation of disconnected test groups

In the data set, on the basis of GLT, there was only one completely disconnected TG.

Thus, to evaluate the effect of complete disconnectedness, 36 TG had sire and dam

identifications modified to generate completely disconnected TG, covering a range of TG

sizes from very small to large (6 to 183 bulls).

Because there were no relationships among bulls within the created disconnected TG,

accuracy of bull EBV from disconnected TG would increase only with the size of the

group. Figure 2.5 shows that increasing the size of disconnected groups reduced the

average PEVD of each TG with all other TG from 1950, in a group with only 6 bulls, to

an asymptotical minimum value around 1850, when 120 bulls were in the TG. Kennedy

and Trus (1993) showed that relationships among bulls within disconnected TG would

increase the PEV of comparisons of EBV across TG. Therefore, connected TG with

average PEVD greater than or equal to 1850 would behave similarly to large

disconnected TG of unrelated bulls with respect to PEVD.

Disconnected TG were easily identified through GLT because it was equal to zero.

However, the VED and CR of those disconnected TG varied between 164 and 739 and

between 0.27 and 1.10, respectively (Figure 2.5). Therefore, completely disconnected TG

presented a large range of VED and CR values and cannot be distinguished from

connected TG.

22

DISCUSSION

The genetic evaluation of bulls tested in central evaluation stations in Ontario,

Canada, is currently performed using an individual animal model. With such a model,

connections among TG occur through additive genetic relationships. Accurate

comparison of estimated breeding values between animals in different groups is necessary

to provide reliable ranking of animals across TG. The accuracy of comparison between

animals in different TG is higher if groups are well connected.

For a bull test station to operate in Ontario some requirements based on minimal

number of bulls (12) and minimal number of sires (4) per TG are observed. Nevertheless,

results of the current study have shown that these requirements were not sufficient to

maintain a high level of connectedness among TG.

Kennedy and Trus (1993) stated that PEV of comparisons between animals or average

PEV of comparisons between groups of animals (PEVD) should be the basis of the

measurement of connectedness. However, computing the PEV matrix is very difficult or

impossible for large data sets. Approximate methods for obtaining diagonal elements of

the PEV matrix of large data sets have been developed (Misztal and Wiggans, 1988;

Meyer, 1989), but they generally do not provide the required off-diagonal elements to

obtain PEVD. If obtaining a measure of connectedness through PEVD is not possible,

alternative methods could be used to predict PEVD and, consequently, provide a measure

of degree of connectedness among management units.

Different criteria for measuring connectedness have been proposed in the literature.

Wood et al. (1991) compared the effectiveness of different breeding programs for

evaluation of pigs in test stations, using only the diagonal elements of the PEV matrix to

measure connectedness. Foulley et al. (1992) proposed calculating the ratio of the

23

determinants of PEV matrices with and without management unit (e.g., TG) in the model.

Laloë (1993) extended the concept of individual coefficient of determination for

measuring the overall precision of a genetic evaluation using linear mixed model

methodology. However, the use of such criteria becomes impossible if the analysis

involves a large number of animals. In this case, approximations or simplifications

similar to those presented by Foulley et al. (1992) were suggested. The concept of

coefficient of determination was also used by Hanocq and Boichard (1999) for measuring

connectedness among breeding studs in the French Holstein cattle population. However,

none of these measurements of connectedness were feasible for implementation in very

large-scale genetic evaluation.

In the current investigation three alternative measures of connectedness (VED, CR,

and GLT) were studied and used in models to predict PEVD. Models with CR and GLT

produced better results than the model with VED in the prediction of average PEVD of

each TG with all other TG, explaining high proportions of the total variance in PEVD.

Comparing the partial coefficient of determination, GLT accounted for a higher

proportion of PEVD variability than VED and CR. The effect of number of sires per TG

and ratio of number of bulls per sire had a small impact on PEVD. In the prediction of

average PEVD of pairs of TG, GLT showed large superiority comparatively to VED and

CR.

The total number of genetic links between TG were mainly (94.5%) due to common

sires. Additional analysis, on which GLT considered also the genetic links due to

common grandparents, showed a small increment (3%) in the R2 of the model to predict

average PEVD of each TG on the basis of GLT. These results suggested that the most

important relationships were accounted for via common sires and dams among TG, in

24

agreement with Hanocq and Boichard (1999). For considering other generations in the

calculation of GLT, the extra computational cost versus the increase in the accuracy of

prediction of PEVD must be evaluated. In the present study the direct genetic links due to

common sires and dams were enough to provide a sufficiently accurate prediction of

PEVD and the increased accuracy of comparisons generated through additional

generations did not compensate the increased computing cost.

When completely disconnected TG were simulated, VED, CR, and GLT showed a

different pattern. On the basis of VED and CR, it was not possible to differentiate

completely disconnected TG from connected ones, because large disconnected TG had

VED and CR values that overlapped those from connected TG. Meanwhile, EBV of bulls

from completely disconnected TG should not be compared with EBV of bulls from other

TG, except when it is possible to assume that genetic levels among management units are

identical. In general, this strong assumption does not hold in industry wide genetic

evaluation.

Although VED and CR are computationally less demanding than PEVD, the effort to

calculate these statistics is still substantial, which can jeopardize the application of these

methods if a very large number of TG were involved. Because GLT is less demanding, it

could be easily routinely calculated.

The use of GLT allowed the identification of disconnected TG (without genetic links).

Hence, an assessment of the quality of the connectedness of a TG could potentially be

obtained before beginning an evaluation by calculating GLT. GLT is less dependent on

the size of TG and would not necessarily favour a large TG, because relatively small TG

may have large GLT and, consequently, low average PEVD.

25

In this study the degree of connectedness among TG was evaluated using PEVD,

VED, CR and GLT. Results obtained by all measures of connectedness indicated that TG

are becoming less connected and, consequently, the accuracy of comparisons of EBV of

bulls in different TG is decreasing. The period after 1994 was markedly poorer with

regard to connectedness, reaching the worst level in 2000 (last year evaluated). The

beginning of this period coincides with a significant change in the structure of bull testing

in Ontario, when larger stations running under contract with the Ontario Ministry of

Agriculture and Food were replaced with private groups. These private groups commonly

represent fewer herds and they tend to be smaller and less connected than their contract

predecessors.

From the predicted PEVD on the basis of VED, CR, and GLT it is possible to

anticipate that increasing the values of VED and decreasing the values of CR and GLT in

relation to those observed in 2000, would cause a reduction in the accuracy of

comparisons and, consequently, potential genetic gain would be compromised. For

modifying the current trend with regard to connectedness and increase the accuracy of

comparisons, recommendations must be developed. Increasing the use of common sires

with high genetic values can increase connectedness among TG, besides promoting

genetic improvement among herds. In addition, GLT could be rapidly determined when

groups of bulls are formed and decisions could be made to increase the number of genetic

links among TG, allowing accurate comparison of EBV across TG.

Kennedy and Trus (1993) showed that connectedness increases with relationship

across groups, while it decreases when the within group relationship increases. Similar

results were observed by Hanocq and Boichard (1999). The increase of genetic

connectedness among TG reduces PEV of comparison of animals in different TG.

26

However, according to Kennedy and Trus (1993), “minimization of PEV does not

necessarily maximize rate of genetic improvement because it may come at a cost of

reduced intensity of selection associated with selection among related as opposed to

unrelated individuals”. Therefore, to maximize genetic gain, equilibrium between

connectedness and intensity of selection should be attained.

The methods for measuring connectedness evaluated in the current investigation are

dependent on the particular structure of the data. Further studies using other test bull data

sets with different structures are warranted.

CONCLUSIONS

The current trend in the accuracy of comparisons of bulls tested in different test

groups in Ontario is not favourable. All measures of connectedness studied showed a

decrease in the degree of connectedness among test groups after 1994.

Average PEVD of pairs of test groups can be more accurately predicted on the basis

of the model that includes GLT than on the basis of models that include VED or CR.

Average PEVD of each test group with all other TG can be more accurately predicted on

the basis of models that include either CR or GLT.

GLT is not excessively computing demanding and allows differentiation between

completely disconnected test groups from connected ones. For these reasons, GLT seems

to be a good alternative to be routinely used for measuring the degree of connectedness

among test groups with the aim of improving the accuracy of comparison of bulls’ EBV

across test groups in central evaluation stations.

27

Table 2.1. Summary of the bull test data

Number of bulls 26,068

Number of animals in the pedigree 58,826

Number of test groups 583

Number of breeds 14

Number of purebred bulls 23,279

Number of crossbred bulls 2,789

Number of bulls per test group a 45 ± 36

Number of sires per test group 23 ± 21

Number of test groups per year 45 ± 10

Average starting age (days) 240 ± 23

Average BEG (kg) b 238 ± 37

a Average ± standard deviation.

b Bull estimated weight gain.

28

Table 2.2. Correlations among PEV of the difference between EBV of bulls from

different test groups (PEVD), variance of estimated differences between test group

effects (VED), connectedness rating (CR) and total number of direct genetic links

between test groups (GLT) for pairs of test groups (above diagonal) and for

averages of each test group with all other test groups (bellow diagonal)

PEVD VED CR GLT

PEVD - 0.71 –0.45 –0.71

VED 0.71 - –0.51 –0.68

CR –0.70 –0.85 - 0.55

GLT –0.66 –0.64 0.86 -

29


determination (R2) of the models to predict average PEVD of pairs of test groups

On the basis of VED On the basis of CR On the basis of GLT

Intercept 1452.3404 ± 0.6481 Intercept 1747.2603 ± 0.6727 Intercept 1698.5458 ± 0.4365

VED a 0.4036 ± 0.0012 CR –181.3726 ± 0.5776 GLT –0.4745 ± 0.0013

NB/S 2.6160 ± 0.0319 CR2 46.8462 ± 0.1801 GLT2 0.0004 ± 0.0000

(NB/S)2 –0.0271 ± 0.0004 Z –46.7829 ± 0.1845 Z –0.0004 ± 0.0000

- - NB/S –12.9617 ± 0.3354 NB/S 6.7573 ± 0.2472

- - (NB/S)2 1.1632 ± 0.0473 (NB/S)2 –0.4526 ± 0.0351

R2 0.53 R2 0.50 R2 0.72

R2 b 0.51 R2 b 0.49 R2 b 0.71

P < 0.0001 for all parameters.

a VED = variance of estimated differences between pairs of test groups.

NB/S = ratio of harmonic means of number of bulls and number of sires for pairs of test

groups.

CR = connectedness rating between pairs of test groups.

GLT = harmonic mean of total number of direct genetic links of pairs of test groups.

Z = knot (junction point between segments) of polynomial regressions.

b % of the total variation accounted by VED, CR, or GLT (partial R2).

30


determination (R2) of the models to predict average PEVD of each test groups with

all other test groups

On the basis of VED On the basis of CR On the basis of GLT

Intercept 1578.2396 ± 0.4255 Intercept 1966.5089 ± 1.9429 Intercept 1819.6820 ± 5.3913

VED a 0.4361 ± 0.0221 CR –480.9942 ± 23.2170 GLT –0.8373 ± 0.0540

S 1.6320 ± 0.2540 CR2 152.4630 ± 12.4006 GLT2 0.0017 ± 0.0001

S2 –0.0146 ± 0.0027 Z –123.7831 ± 16.9157 Z1 –0.0017 ± 0.0002

- - S 3.774370 ± 0.2724 Z2 –0.0001 ± 0.0000

- - S2 –0.0218 ± 0.0030 S 0.5275 ± 0.0702

- - NB/S 18.0660 ± 1.9416 NB/S 1.8331 ± 0.3681

- - (NB/S)2 0.9151 ± 0.1805 (NB/S)2 –0.0196 ± 0.0075

R2 0.55 R2 0.82 R2 0.79

R2 b 0.50 R2 b 0.73 R2 b 0.76

P < 0.0001 for all parameters.

a VED = average variance of estimated differences of each test group with all other test

groups.

S = number of sires represented in each test group.

CR = average connectedness rating of each test group with all other test groups.

NB/S = average ratio of number of bulls per sire represented in each test group.

GLT = total number of direct genetic links between each test group and all other test

groups.

Z, Z1, and Z2 = knots (junction points between segments) of polynomial regressions.

b % of the total variation accounted by VED, CR, or GLT (partial R2).

31

1560

1580

1600

1620

1640

1660

1680

1700

1720

1988 1990 1992 1994 1996 1998 2000

Year

PEV

D

200

250

300

350

400

450

500

VE

D

PEVD

VED

0

0.25

0.5

0.75

1

1.25

1.5

1988 1990 1992 1994 1996 1998 2000

Year

CR

100

200

300

400

500

600

GL

T

CR

GLT

1690

1710

1730

1750

1770

1790

1988 1990 1992 1994 1996 1998 2000

Year

PEV

D

200

230

260

290

320

350

380

410

VE

D

PEVD

VED

0

0.25

0.5

0.75

1

1.25

1.5

1.75

1988 1990 1992 1994 1996 1998 2000

Year

CR

0

200

400

600

800

1000

1200

GL

T

CR

GLT

Figure 2.1. Average degree of connectedness for pairs of test groups (top) and for each test

group with all other test groups (bottom) on the basis of PEVD, VED, CR, and GLT

32

Figure 2.2. Observed relationship of average PEVD per test group with number of bulls

per test group, average PEVD per test group with number of sires per test group, CR with

number of bulls per test group, and GLT with number of bulls per test group

33

Figure 2.3. Observed relationship of average PEVD of pairs of test groups with VED,

CR, and GLT

34

Figure 2.4. Observed relationship of average PEVD of each test group with VED, CR,

and GLT

35

1700

1750

1800

1850

1900

1950

2000

5 35 65 95 125 155 185

Number of bulls per TG

PEV

D (k

g**2

)

Disconnected

Connected

1700

1750

1800

1850

1900

1950

2000

100 200 300 400 500 600 700

VED (kg**2)

PEV

D (k

g**2

)

Disconnected

Connected

1700

1750

1800

1850

1900

1950

2000

0 0.5 1 1.5 2 2.5

CR

PEV

D (k

g**2

)

Disconnected

Connected

1700

1750

1800

1850

1900

1950

2000

0 500 1000 1500 2000 2500 3000

GLT

PEV

D (k

g**2

)

Disconnected

Connected

Figure 2.5. Observed relationship of average PEVD of each test group with number of bulls

per test group, VED, CR, and GLT for connected and disconnected test groups

36

Chapter 3

Additive, dominance, and epistatic loss effects

on pre-weaning gain in crossing of different Bos

taurus breeds

ABSTRACT - Objectives of this study were to estimate variance components, direct and

maternal breed additive, dominance, and epistatic loss effects, and additive genetic

changes for pre-weaning gain (kg). Data were from 478,466 animals from beef herds

enrolled with Beef Improvement Ontario (BIO), from 1986 to 1999, including records of

both purebred and crossbred animals from Angus, Blond D’Aquitane, Charolais,

Gelbvieh, Hereford, Limousin, Maine-Anjou, Salers, Shorthorn, and Simmental breeds.

The genetic model used in the analysis included fixed genetic effects of breed,

dominance, and epistatic loss, fixed environmental effects of age of the calf,

contemporary group, and age of the dam by sex of the calf, random additive direct and

maternal genetic effects, and random maternal permanent environment effect.

Coefficients of direct and maternal dominance effects were equal to expected direct and

maternal breed heterozygosities, respectively. Coefficients of direct and maternal epistatic

loss effects were average expected breed heterozygosities in the uniting gametes that

37

generated an individual. Variance components were estimated by REML. Genetic

changes of Angus, Charolais, Hereford, Limousin, and Simmental were obtained using

two approaches: through regression of average breeding values of purebred animals on

birth year, obtained separately for each breed, and through the within year regression of

breeding values on the contribution of each breed to the animals. Estimates of direct and

maternal additive genetic, maternal permanent environmental, and residual variances,

expressed as proportions of the phenotypic variance, were 0.32, 0.20, 0.12, and 0.52,

respectively. Annual additive genetic changes were positive for all breeds. Results from

the two approaches used to estimate genetic changes suggest that producers used animals

of substantially higher additive genetic value to produce purebred Charolais, Hereford,

and Simmental than to produce crossbred animals. Breeds ranked similarly to what was

expected, but estimates of both direct and maternal effects showed large standard errors.

Both direct and maternal dominance had a favourable effect (P < 0.05) on pre-weaning

gain, equivalent to 1.31% and 2.28% of the phenotypic mean, respectively. The same

features for direct and maternal epistatic loss effects were –2.19% (P < 0.05) and –0.08%

(P > 0.05), respectively.

Key words: beef cattle, genetic trends, heterosis, variance components.

Abbreviations: AN, Angus; BD, Blond D’Aquitane; CH, Charolais; E, epistatic loss

effect; ED, coefficient of direct epistatic loss effect; EM, coefficient of maternal epistatic

loss effect; GV, Gelbvieh; H, dominance effect; HD, coefficient of direct dominance

effect; HE, Hereford; HM, coefficient of maternal dominance effect; LM, Limousin; MA,

Maine-Anjou; SA, Salers; SH, Shorthorn; SM, Simmental.

38

INTRODUCTION

Pre-weaning gain is an economically important trait that receives considerable

attention in the multi-breed genetic evaluation of beef cattle in many countries. Both

direct and maternal effects contribute to the growth of young beef cattle. For acquiring

reliable ranking of animals in the genetic evaluation of a multi-breed population, both

additive and non-additive genetic effects have to be accounted (Arthur et al., 1999). Non-

additive effects are represented by dominance and epistatic effects, which result from

intra and inter-locus interactions, respectively. Both dominance and epistatic effects are

components of heterosis in crossbred animals. Estimates of such effects should be

obtained from the dataset used to evaluate the animals, provided that there are enough

records to generate reliable estimates.

In beef cattle improvement programs, dominance effects associated with breed

heterozygosity are generally taken into account in the estimation of breeding values of

crossbred animals. Additive-dominance models, which simultaneously estimate additive

and heterotic effects or estimate additive effect after pre-adjustment of records for

heterosis on the basis of breed heterozygosity, are standard models. These models have

been used in large beef cattle populations in Canada (Miller, 1996; Sullivan et al., 1999),

Brazil (Roso and Fries, 1998), Australia (Johnston et al., 1999), and USA (Pollak and

Quaas, 1998; Klei et al., 2002).

The justification for additive-dominance models is based on the assumption that

heterosis is mainly due to dominance effects, in agreement with results obtained in large

beef cattle crossbreeding experiments conducted at the United States Department of

Agriculture Meat Animal Research Center, Clay Center, Nebraska (Gregory et al., 1997).

According to these authors, the heterosis observed in growth traits of beef cattle is likely

39

due to dominance effects of genes and represents the recovery of accumulated inbreeding

depression within populations that have been genetically isolated from each other for

many generations. Studies of Gregory et al. (1997) suggested that retention of heterosis is

linearly proportional to heterozygosity. A similar relationship between heterosis and

heterozygosity was observed by Arthur et al. (1999) and Fries et al. (2000). In these two

later studies, however, the authors suggested that another component, the epistatic loss

effect, could be added to the additive-dominance model to provide a better explanation of

the genetic differences between animals of different breed compositions. The epistatic

loss in crossbred animals represents the effect due to the breakdown of favourable

interactions between loci existent in purebred animals, which have been built by both

natural and artificial selection within breeds (Koch et al., 1985).

Crossbreeding is a common practice in the beef industry. Because an important

objective of crossbreeding is to take advantage of breed additive and between-breed non-

additive genetic effects, analysis of additive, dominance and epistatic loss effects is

important when evaluating commercial cattle.

Objectives of this study were to estimate variance components, direct and maternal

breed additive, dominance, and epistatic loss effects, and breed additive genetic changes

for pre-weaning gain in a typical multi-breed population of beef cattle.


Data

The data used in this study were pre-weaning weight gain of animals from beef herds

enrolled by Beef Improvement Ontario (BIO), from 1986 to 1999. The dataset after

preliminary edits consisted of 869,050 records, including records of both purebred and

40

crossbred animals. A subset of purebred and crossbred animals from the 10 most popular

breeds, including Angus (AN), Blond D’Aquitane (BD), Charolais (CH), Gelbvieh (GV),

Hereford (HE), Limousin (LM), Maine-Anjou (MA), Salers (SA), Shorthorn (SH), and

Simmental (SM), was used in the analysis. Some animals had a fraction of the breed

composition from an undetermined breed, which was treated as another breed, named

Unknown (UN). Only records of animals with complete information for calculating direct

and maternal dominance and epistatic loss coefficients (described later) were kept.

Connectedness analysis

An analysis to check for connectedness among contemporary groups (herd-year-

season-management group) across breeds was performed. The method used was the total

number of direct genetic links between contemporary groups due to common sires and

dams (GLT), which was described in Chapter 2. Contemporary groups with more than 10

calves and with at least 10 direct genetic links and two classes of direct or maternal

heterozygosities (described later) were considered connected and retained for the

analysis. The resulting dataset included 23,059 contemporary groups, 478,466 calves,

19,908 sires, and 234,608 dams. A pedigree file of 714,220 animals was used in the

analysis.

Predictor variables of fixed genetic effects

(1) Breed additive effects

Coefficients for direct and maternal breed additive effects were equal to the genetic

contribution of each breed to the breed composition of the calf and to the breed

41

composition of the dam, respectively. The estimates of direct and maternal breed additive

effects were expressed as differences relative to Angus.

Breed compositions of the animals are depicted in Figures 3.1 and 3.2. Figure 3.1

shows that less than 40% of the calves were purebred, clearly indicating that commercial

beef herds prefer crossbred to straightbred calves. Among the crossbred calves, most of

them originated from two breed crosses. On the contrary, most sires (89.3%) and dams

(61.3%) were purebred. Figure 3.2 shows that breeding practices in the commercial beef

herds studied resulted in an unbalanced number of animals among breeds. There were

substantially larger numbers of Angus, Charolais, Hereford, Limousin, and Simmental

calves, sires, and dams than Blond D’Aquitane, Gelbvieh, Maine-Anjou, Salers, and

Shorthorn. A considerable number of calves, sires, and dams (21.29, 7.87, and 15.62%,

respectively) had some portion of unknown breed in the breed composition with average

portion of unknown breed equal to 18%, 16%, and 40%, respectively. These animals were

kept in the analysis because they provided useful information for estimating other effects

considered in the genetic model.

(2) Dominance effects

Coefficients of direct (HD) and maternal (HM) dominance effects were equal to

expected direct and maternal breed heterozygosities, respectively. HD and HM were

calculated using the following equations:

HD = 1 – nb

1=iSi × Di

and

42

HM = 1 – nb

1=iMGSi × MGDi,

where nb is the number of breeds (11), and Si, Di, MGSi, and MGDi are the fractions of

the ith breed for the sire, dam, maternal grandsire, and maternal granddam breed

composition, respectively.

(3) Epistatic loss effects

For estimating epistatic loss effects, it was assumed that the parents of an individual

produce more recombinant gametes the larger their breed heterozygosities. Thus, the

coefficients for direct (ED) and maternal (EM) epistatic loss effects were calculated as the

average breed heterozygosities in uniting gametes that generated the individual (Fries et

el., 2000). Epistatic loss will be proportional to the average heterozygosity observed in

parents and will be maximum when both parents of an individual are F1s. ED and EM

were calculated as:

ED = 0.5 (HSire + HDam)

and

EM = 0.5 (HMGS + HMGD),

where HSire, HDam, HMGS, and HMGD are the expected breed heterozygosities of the sire,

dam, maternal grandsire, and maternal granddam, respectively. The average epistatic loss

due to the breakdown of all kinds of gene interactions, as deviation from the average

additive and dominance effects, will be estimated by ED and EM (Fries et. al. 2002).

Table 3.1 shows coefficients of direct and maternal dominance and epistatic genetic

effects for different mating systems involving two breeds, A and B.

43

The distribution of observations among coefficients of dominance and epistatic loss

effects is presented in Table 3.2. For ease of presentation, coefficients of dominance and

epistatic loss effects were grouped in classes of 0.125, ranging from zero to one. Numbers

in Table 3.2 suggest that there was a better distribution of observations among classes of

coefficients of direct and maternal dominance than among classes of coefficients of direct

and maternal epistatic loss effects. Because only approximately 10% of the sires are

crossbred (Figure 3.1), there were relatively few observations in the classes of

coefficients of epistatic loss effects larger than 0.625. The mean and standard deviation of

pre-weaning gain and predictor variables considered in the analysis are presented in Table

3.3.

Genetic analysis

The genetic model for pre-weaning gain, defined in matrix notation, was:

y = Xb + Fv + Za + Wm + Sp + e,

where

y = vector of observations;

b = vector of fixed genetic effects. This vector included direct and maternal breed

additive, dominance, and epistatic loss effects;

v = vector of fixed environmental effects. This vector included age of the calf as a

covariate (linear and quadratic effects), and age of the dam by sex of the calf and

contemporary group (herd-year-season-management group) as classification variables;

a = vector of random direct additive genetic effects;

m = vector of random maternal additive genetic effects;

p = vector of random maternal permanent environment, and

44

e = vector of random residual effects.

X, F, Z, W, and S are incidence matrices relating records to fixed genetic, fixed

environmental, direct genetic, maternal genetic, and permanent environment effects,

respectively.

The vectors of random effects a, m, p, and e were assumed to have (co)variance

matrices equal to A�a2, A�m

2, I�p2, and I�e

2, respectively, where A is the additive

numerator relationship matrix among animals and I is an identity matrix. Covariance

between a and m was assumed equal to A�am. Homogeneity of variances and the same

dominance and epistatic loss effects for crosses of different pairs of breeds, and no

interactions between genetic and environmental effects were assumed.

Estimates of (co)variance components (�a2, �m

2, �p2, �e

2, and �am) and estimates of the

effects included in the model were obtained using the DMU program (Madsen and

Jensen, 2000). First, (co)variance components were estimated by the restricted maximum

likelihood method, using a data subset containing 300,002 records from randomly

sampled herds, which overcame computational limitations. Given the estimated

(co)variance components, the estimates of the effects in the model were obtained using

the complete dataset.

Multi-breed additive genetic changes

To estimate genetic changes for the breeds with the largest number of records (Angus,

Charolais, Hereford, Limousin, and Simmental), two different approaches were used:

(1) Regression of average estimated breeding values of purebred calves on birth year,

computed separately for each breed, and

45

(2) Regression of estimated breeding values on contribution of each breed to the breed

composition of the calves in a given birth year (Klei et al., 2002; Elzo et al., 2004).

The regression approach (2) for calculating the yearly means of each breed used

information of both purebred and crossbred animals. Thus, the regression coefficient

obtained for each breed, in every year, accounted for additive genetic changes due to

alleles coming from both purebred and crossbred animals. Differences between breed

regression coefficients and yearly average estimated breeding values of purebred animals

were calculated to determine the genetic contribution of crossbred animals.

The number of purebred and crossbred (expressed as equivalent to purebred) calves

contributing to genetic changes of each breed is presented in Figure 3.3. To express the

number of crossbred calves as equivalent number of purebred calves, breed portions that

differed from 1.0 in the breed composition were added over all calves. Comparison

among breeds presented in Figure 3.3 reveals that, proportionally, there were more alleles

of breeds Charolais, Limousin, and Simmental in crossbred animals than Angus and

Hereford.

RESULTS

(Co)variance components

Estimates of direct additive genetic variance (�a2), maternal additive genetic variance

(�m2), maternal permanent environmental variance (�p

2), residual variance (�e2), and

direct by maternal additive genetic covariance (�am) of pre-weaning gain are presented in

Table 3.4. For ease of interpretation variances were expressed as proportions of

phenotypic variance (�t2), where �t

2 = �a2 + �m

2 + �am + �p2 + �e

2. Thus, ha2 = �a

2 / �t2,

46

hm2 = �m

2 / �t2, p2 = �p

2 / �t2, and e2 = �e

2 / �t2. The correlation between direct and

maternal genetic effects was calculated by ram = �am / (�a �m).

Multi-breed additive genetic changes

The yearly average estimated breeding values of purebred animals and the regression

of yearly estimated breeding values on contribution of each breed to the breed

composition of the animals for breeds Angus, Hereford, Charolais, Limousin, and

Simmental are depicted in Figure 3.4. The additive genetic changes in pre-weaning gain

per year are presented in Table 3.5.

All breeds showed positive additive genetic changes (P < 0.01). Estimates of genetic

changes obtained by the regression and by the average method should be similar if the

sample of breed alleles coming from purebred animals has similar additive genetic value

to the sample of breed alleles coming from crossbred animals. Average estimated

breeding values and regression coefficients for Angus were similar from 1986 to 1993. In

the last three years, regression coefficients were larger than average estimated breeding

values. Additive genetic changes of Charolais, Hereford, and Simmental had pattern

similar to one another, showing average estimated breeding values larger than regression

coefficients. These results suggest that alleles coming from purebred Charolais, Hereford,

and Simmental have higher additive genetic values than alleles coming from crossbred

animals. With regard to Limousin, average estimated breeding values were larger than

regression coefficients for most years, but there was no clear trend pointing out

differences between the two approaches.

To determine the influence of the correlation between direct and maternal genetic

effects on estimates of breed additive genetic changes, an additional analysis assuming a

47

zero correlation between direct and maternal genetic effects, based on national

recommendations for Canada (AAFC, 1993), was performed. Additive genetic changes

per year did not greatly differ from those presented in Table 3.5, where a correlation

–0.63 between direct and maternal genetic effects was used. Differences in the genetic

changes per year (kg) were equal to or lower than 0.03% for all breeds.

Dominance and epistatic loss effects

Estimates of direct and maternal dominance and epistatic loss effects on pre-weaning

gain associated with breed heterozygosity are presented in Table 3.6. The magnitude of

both dominance and epistatic loss effects were low. Expressed relative to the phenotypic

mean, direct and maternal dominance effects had a positive effect (P < 0.05) of 1.31%

and 2.28%, respectively. Direct and maternal epistatic loss effects were equal to –2.19%

(P < 0.05) and –0.08% (P > 0.05), respectively.

Breed additive effects

Estimates of direct and maternal breed additive effects on pre-weaning gain,

expressed as deviations from Angus, are presented in Table 3.7. Estimates of direct breed

additive effects of Hereford, Limousin, and Shorthorn were lower than estimates of

Angus. Salers slightly exceeded Angus (0.60 kg), while Charolais, Gelbvieh, Maine-

Anjou, and Simmental exceeded Angus by more than 10 kg for direct effects.

Estimates of maternal breed additive effects of Blond D’Aquitane, Charolais, and

Hereford were lower than Angus. Limousin and Maine-Anjou exceeded Angus by less

than one kg. Gelbvieh, Salers, Shorthorn, and Simmental exceeded Angus by more than

4.5 kg.

48

The standard errors of the estimates of both direct and maternal breed additive effects

were large for all the breeds and greater for those breeds represented by small number of

calves (Blond D’Aquitane, Gelbvieh, Maine-Anjou, Salers, and Shorthorn).

Sampling correlations

The dataset used in this investigation came from commercial herds and, therefore, was

not designed to estimate breed additive, dominance, and epistatic loss effects.

Cunningham and Connolly (1989) showed that high correlation between estimates might

jeopardize the precision of estimation of genetic effects. Even estimable functions may be

highly confounded.

For obtaining information with regard to degree of confounding between estimates,

sampling correlations among additive, dominance, and epistatic loss effects were

calculated (Table 3.8). The sample correlation between maternal dominance and direct

epistatic loss effects was very high, indicating that it was very difficult to separate the

unique effect of each of these two genetic effects. Sample correlations between breeds

were generally high. Sample correlations of direct breed additive effects with maternal

breed additive effects within the same breed were greater than sampling correlations

between different breeds. Thus it was generally more difficult to separate direct and

maternal additive genetic effects within breeds than between breeds.

DISCUSSION

Estimates of ha2, hm

2, p2, and e2 obtained in this study were compared to estimates

from previous studies of Miller (1996) in Ontario and with average results reported by

Koots et al. (1994a) in a review of a large number of published estimates of genetic

49

parameters. Estimates of ha2, hm

2, p2, and e2 were equal to 0.32, 0.20, 0.12, and 0.52,

respectively. Estimates of ha2 and hm

2 were in line with pooled estimates of 0.27 and 0.23,

respectively, reported by Koots et al. (1994a). The estimate of ha2 also did not greatly

differ from Sullivan et al. (1999), where a ha2 equal to 0.30 was used in the estimation of

genetic trends and mean genetic differences among breeds in Ontario. Estimates of ha2,

hm2, and p2 were lower than estimates of 0.44, 0.25 and 0.15, respectively, obtained by

Miller (1996). Differences between estimates of (co)variance components of this study

and estimates of Miller (1996) are likely due to differences in the datasets and models

used. The dataset used by Miller (1996) was considerably smaller (75,365 records) and

direct and maternal epistatic loss effects were not fitted. Additional analyses dropping

epistatic loss effects, however, suggest that differences in the datasets were the main

cause of differences between estimates of variance components of the present study and

the study of Miller (1996).

The correlation between direct and maternal genetic effects on pre-weaning gain

(ram = –0.63), although lower in absolute value than the estimate of –0.77 obtained by

Miller (1996), was still strongly negative. This result is in marked contrast with average

estimate of –0.25 reported by Koots et al. (1994b), and greater than estimates of Meyer

(1992) and Robinson (1996), where average values of –0.59 and –0.47 were reported. A

possible cause contributing to the strong negative genetic correlation is the small

proportion of female calves with records that later had their own progeny. In the dataset

there were only 23,508 cases where a female calf later become a cow, corresponding to

approximately 10% of all female calves.

The two approaches used to estimate additive genetic changes of Angus, Charolais,

Hereford, Limousin, and Simmental indicated positive annual changes in these breeds.

50

Average genetic changes per year, estimated by both approaches, were lower than 0.20%

of the phenotypic mean for all breeds. Smith (1984), assuming a single trait selection,

calculated theoretically possible rate of genetic change per year in beef cattle growth

traits of 1.4% of the mean. The same author reported that rates of genetic changes of 0.7

and 0.3% per year were achieved in long-term selection experiments and breeding

programs in practice, respectively. Selection practices in Ontario are based on multiple-

trait selection and, therefore, lower genetic changes in individual traits than under single

trait selection are expected. The genetic improvement, however, will be balanced over

several traits, improving the overall economic merit.

The regression approach for calculating yearly genetic means for each breed allowed

the contribution of crossbred and purebred animals in the population to be included,

making full use of all available information when calculating additive genetic changes.

Comparison of results obtained by the regression approach with the traditional average

estimated breeding values of purebred animals revealed that producers used animals of

substantially higher additive genetic value to produce purebred Charolais, Hereford, and

Simmental than to produce crossbred animals, which could reflect different selection

goals, population sizes, and sire availability per breed. The additive genetic value of

Angus and Limousin for producing purebred or crossbred animals tended to be similar.

Both direct and maternal dominance effects showed a favourable effect on pre-

weaning gain. Direct and maternal epistatic loss effects had the anticipated negative

effect. However, for dominance and epistatic loss effects, the magnitude of the estimates

was small. Epistatic effects on a specific trait may be either favorable or unfavorable,

depending on selection history of the population and genetic correlations among traits.

Favorable epistatic effects may result from direct selection for a particular trait, while

51

unfavorable effects may result from correlated response of traits with antagonistic genetic

correlation (Cassady et al., 2002). Direct and maternal dominance were equivalent to

1.31% and 2.28% of the phenotypic average. Direct and maternal epistatic loss effects

were equivalent to –2.19% and –0.08%, respectively. The maternal epistatic loss effect

was statistically not different from zero, probably reflecting the deficiency in the structure

of data to estimate this genetic effect, as shown in Table 3.2. To detect a significant

effect, a larger proportion of crossbred sires might be required. The small proportion of

crossbred sires (and grandsires) in the dataset has two consequences. Firstly, it reduces

the expression of epistatic loss because at least one allele at each locus will be from a

parental breed in a large proportion of crossbred progeny, which reduces the breakdown

of favorable interactions established in the pure breeds (Kinghorn, 1983). Secondly, it

increases the dependence between dominance and epistatic effects, causing collinearity

between these two genetic effects. Additional analyses, assuming a zero covariance

between direct and maternal genetic effects (not reported), resulted in estimates of direct

and maternal dominance and epistatic loss effects similar to those obtained when a non-

zero covariance was used.

According to results obtained by Gregory et al. (1997) in a large beef cattle

crossbreeding experiment, the heterosis observed in growth traits of beef cattle is likely

due to dominance effects. This observation allows fitting heterosis as being proportional

to the probability that alleles at a locus come from different breeds, which is equal to the

breed heterozygosity. Further analysis fitting only dominance effects in the model

(excluding epistatic loss effects) resulted in estimates of direct and maternal dominance of

1.31% and 1.84%, respectively. Therefore, estimates of dominance effects from both

models did not greatly differ. These results were also in close agreement with Miller

52

(1996), who reported estimates of direct and maternal heterosis of 1.34% and 2.28%,

respectively, assuming a dominance model. In the multi-breed genetic evaluation

currently run in Ontario, records are pre-adjusted for heterosis on the basis of

heterozygosity. For pre-weaning growth traits, direct and maternal heterosis of 5% are

assumed for an individual with heterozygosity of 100%, regardless of the breeds involved

(Sullivan et al., 1999).

Koch et al. (1985) evaluated dominance and epistatic loss effects on weaning gain of

Angus × Hereford crosses. In their study, direct dominance and epistatic loss effects were

not significant despite the relatively large negative values. They stated that a larger

dataset and a more complete array of mating types would be needed to attain statistically

significant results. In a review of a large number of experimental results including beef

cattle, dairy cattle, pigs, poultry, and sheep, Sheridan (1981) found that, in many cases,

the level of heterosis in crossbreeding populations other than the F1 was substantially

below expectation on the basis of heterozygosities. The conclusion of the review of

Sheridan (1981) was that, based on the performance of purebred and F1 populations, it

was not possible to predict the level of heterosis in other various genotypes, suggesting

the presence of epistatic effects. According to Cunningham (1987), although in some

cases epistatic loss effects can be safely neglected, their proper evaluation is one of the

unsolved problems of animal breeding research. Recent studies have reported epistatic

loss on pre-weaning gain in crosses between Bos taurus and Bos indicus (Fries et al.,

2000; Piccoli et al., 2002; Demeke et al., 2003; Pimentel et al., 2003; Cardoso, 2004).

Because Bos taurus and Bos indicus have greater genetic distance (larger potential

differences in gene frequencies), Bos taurus x Bos indicus crosses generally express a

53

higher level of heterosis in comparison to crosses between Bos taurus breeds. As a

consequence, greater epistatic loss is expected in their crosses.

Standard errors of maternal dominance and direct epistatic loss effects were large, in

comparison to standard errors of direct dominance and maternal epistatic loss effects.

Estimates of maternal dominance and direct epistatic loss effects were of comparable

magnitude, although opposite in sign. The sampling correlation between estimates was

very high, likely due to a structural deficiency of the data to separate maternal dominance

and epistatic loss effects and/or due to linear dependencies (multicollinearity) involving

predictor variables of maternal dominance (HM) and direct epistatic loss (ED) effects.

Estimates of breed additive effects were in general agreement with what were

expected based on previous studies of Miller (1996). Further analysis assuming a zero

genetic covariance between direct and maternal genetic effects resulted in small changes

in the estimates of breed effects and no changes in the rank of the breeds (not reported).

Standard errors of the estimates of breed effects and sampling correlations between

estimates, particularly between direct and maternal breed effects, were high. These results

could be a symptom of lack of enough information to estimate both direct and maternal

breed additive effects and/or multicollinearity among corresponding predictor variables.

With a high degree of multicollinearity, estimates of regression coefficients obtained by

ordinary least square methods typically have large standard errors, indicating that they

could be highly confounded. In addition, a high degree of multicollinearity would result

in breed estimates that are sensitive to changes in the dataset.

Because estimates of breed effects comprise part of the across-breed estimated

breeding values (ABC) used as selection criteria for across breed comparisons in the beef

industry, lack of enough information in the data to adequately separate breed additive

54

effects and/or multicollinearity among predictor variables of breed effects may result in

less reliable ranking of the animals.

CONCLUSIONS

Estimates of (co)variance components of pre-weaning gain of beef cattle did not

greatly differ from previous studies in Ontario. The large estimated negative genetic

correlation between direct and maternal effects seems to be more likely a consequence of

lack of enough information in the dataset to separate these effects than an indication of a

true negative relationship.

Annual additive genetic changes were positive for all major breeds evaluated

(Angus, Charolais, Hereford, Limousin, and Simmental). The traditional approach for

estimating genetic changes based on information from purebred calves and the alternative

approach based on information from both purebred and crossbred calves revealed

differences in selection practices among breeds. Producers used animals of substantially

higher additive genetic values to produce purebred Charolais, Hereford, and Simmental

than to produce crossbred animals. Producers of Angus and Limousin used animals of

similar genetic values to produce both purebred and crossbred animals.

Both direct and maternal dominance effects caused a favourable effect on pre-

weaning gain. Direct epistatic loss reduced the performance of the animals, whereas

maternal epistatic loss did not significantly affect the pre-weaning gain.

Results from this study accumulated more evidence that the level of direct and

maternal heterosis on pre-weaning gain in Ontario is lower than 5%, which indicates that

this assumed level should be reviewed in the multi-breed genetic evaluation.

55

Breeds ranked similarly from what was expected, but estimates were highly unstable,

showing high standard errors. Further investigation to detect the causes of instability and

application of alternative statistical methods are warranted.

56

Table 3.1. Coefficients of direct (HD) and maternal (HM) dominance and direct (ED)

and maternal (EM) epistatic loss genetic effects for different mating systems

involving two breeds, A and B

Sire Dam fA a HD HM ED EM

Parental

A A 1 0 0 0 0

B B 0 0 0 0 0

F1

A B ½ 1 0 0 0

B A ½ 1 0 0 0

Backcrosses

A AB ¾ ½ 1 ½ 0

B AB ¼ ½ 1 ½ 0

AB B ¼ ½ 0 ½ 0

AB A ¾ ½ 0 ½ 0

Advanced generations

F1 F1 ½ ½ 1 1 0

F2 F2 ½ ½ ½ ½ 1

F3 F3 ½ ½ ½ ½ ½ a Fraction of breed A in the breed composition of the animal

57

Table 3.2. Distribution of observations among coefficients of direct (HD) and

maternal (HM) dominance and direct (ED) and maternal (EM) epistatic loss genetic

effects

Class a HD HM ED EM

0.000 184,115 285,368 261,239 413,347

0.125 18,825 9,159 32,202 10,324

0.250 15,414 9,476 29,755 11,710

0.375 3,322 759 10,764 1,563

0.500 62,701 27,320 119,984 33,332

0.625 8,622 5,568 15,871 6,137

0.750 6,722 2,193 4,184 1,647

0.875 2,630 352 434 10

1.000 176,115 138,271 4,033 396

a Coefficients of dominance and epistatic loss effects were grouped in classes of 0.125,

ranging from zero to one. Every class included fractions equal or smaller than the

mentioned class.

58

Table 3.3. Mean and standard deviation (SD) of pre-weaning gain (PWG), weaning

age (Age), coefficients of direct and maternal breed additive, dominance (HD and

HM), and epistatic loss (ED and EM) genetic effects

Trait a Mean ± SD

PWG (kg) 203.47 ± 49.41

Age (days) 203.95 ± 30.87

HD 0 .47 ± 0.44

HM 0.34 ± 0.45

ED 0.19 ± 0.24

EM 0.05 ± 0.15

Direct Maternal

Breed Mean ± SD b Calves c Mean ± SD Dams d

Angus 0.59 ± 33 77,324 0.74 ± 29 62,612

Blond D’Aquitane 0.61 ± 28 15,469 0.79 ± 25 8,143

Charolais 0.58 ± 27 163,148 0.71 ± 27 109,477

Gelbvieh 0.58 ± 26 3,750 0.73 ± 25 1,807

Hereford 0.56 ± 33 235,681 0.75 ± 29 225,368

Limousin 0.60 ± 24 115,137 0.72 ± 25 59,127

Maine-Anjou 0.54 ± 27 8,769 0.70 ± 28 5,818

Salers 0.70 ± 28 3,051 0.81 ± 24 8,830

Shorthorn 0.42 ± 30 30,503 0.64 ± 30 28,722

Simmental 0.58 ± 29 140,667 0.69 ± 26 107,581

Unknown 0.18 ± 14 101,902 0.40 ± 25 75,676

a Coefficients of direct and maternal breed additive, dominance, and epistatic loss genetic

effects range from zero to one. b Only animals containing some portion of the indicated breed were included for

calculating mean and standard deviation. c Number of calves containing some portion of the indicated breed. d Number of dams containing some portion of the indicated breed.

59

Table 3.4. Estimates of (co)variance components and genetic parameters of pre-

weaning gain (kg)

(Co)variance component a Estimate Parameter b Estimate

�a2 254.47 ± 8.54 ha

2 0.32

�m2 161.17 ± 9.28 hm

2 0.20

�p2 94.12 ± 5.38 p2 0.12

�e2 408.18 ± 4.78 e2 0.52

�am –128.61 ± 7.76 ram –0.63 ± 0.02

a �a2 = direct additive genetic variance.

�m2 = maternal additive genetic variance.

�p2 = maternal permanent environmental variance.

�e2 = residual variance.

�am = direct by maternal additive genetic covariance.

b Variance component as a proportion of phenotypic variance (�t2).

�t2 = �a

2 + �m2 + �am + �p

2 + �e2

For �am, genetic correlation is shown.

60

Table 3.5. Multi-breed additive genetic changes in pre-weaning gain per year

obtained through regression of average estimated breeding values of purebred

calves on birth year (Average) and through regression of estimated breeding values

on contribution of each breed to the breed composition of the calves (Regression)

Breed Average (kg) Regression (kg)

Angus 0.28 ± 0.05

(0.14%) a

0.28 ± 0.04

(0.14%)

Charolais 0.22 ± 0.02

(0.11%)

0.17 ± 0.02

(0.08%)

Hereford 0.35 ± 0.03

(0.17%)

0.27 ± 0.02

(0.13%)

Limousin 0.10 ± 0.03

(0.05%)

0.12 ± 0.02

(0.06%)

Simmental 0.27 ± 0.03

(0.13%)

0.25 ± 0.02

(0.12%)

a Values between parentheses were expressed as percentages of the overall phenotypic

average.

61

Table 3.6. Estimates and standard errors of direct and maternal dominance (H) and

epistatic loss (E) effects on pre-weaning gain (kg)

Direct Maternal

H 2.67 ± 0.20

(1.31%) a

4.64 ± 0.83

(2.28%)

E –4.45 ± 1.63

(–2.19%)

–0.16 ± 0.39

(–0.08%)

a Values between parentheses are expressed relative to the overall phenotypic average.

62

Table 3.7. Estimates (as deviations from Angus) and standard errors of direct and

maternal breed additive effects for pre-weaning gain (kg)

Breed Direct Maternal

Blond D’Aquitane 5.34 ± 4.10 –5.68 ± 2.32

Charolais 13.19 ± 3.61 –2.88 ± 1.86

Gelbvieh 10.41 ± 4.69 7.91 ± 3.12

Hereford –6.26 ± 3.64 –3.21 ± 1.86

Limousin –3.07 ± 3.66 0.61 ± 1.90

Maine-Anjou 12.29 ± 4.59 0.25 ± 2.52

Salers 0.60 ± 4.27 7.55 ± 2.49

Shorthorn –9.30 ± 4.29 4.57 ± 2.23

Simmental 14.19 ± 3.64 5.33 ± 1.87

63

Table 3.8. Sampling correlations among estimates of direct (D) and maternal (M)

fixed genetic effects

HD a ED AND BDD CHD GVD HED LMD MAD SAD SHD SMD

HD 1.00

ED –0.01 1.00

AND 0.00 0.32 1.00

BDD –0.01 0.28 0.84 1.00

CHD 0.01 0.32 0.94 0.86 1.00

GVD –0.01 0.24 0.73 0.67 0.74 1.00

HED 0.04 0.31 0.93 0.84 0.95 0.72 1.00

LMD –0.00 0.31 0.93 0.85 0.96 0.73 0.94 1.00

MAD 0.00 0.23 0.74 0.67 0.75 0.58 0.74 0.75 1.00

SAD –0.01 0.26 0.79 0.71 0.80 0.61 0.79 0.79 0.64 1.00

SHD 0.04 0.25 0.77 0.70 0.79 0.60 0.78 0.78 0.64 0.66 1.00

SMD 0.02 0.32 0.94 0.85 0.96 0.73 0.94 0.95 0.75 0.79 0.78 1.00

HM –0.01 –0.98 –0.31 –0.28 –0.32 –0.24 –0.31 –0.31 –0.23 –0.25 –0.24 –0.31

EM –0.01 –0.00 –0.01 –0.01 –0.01 –0.00 –0.01 –0.01 –0.00 –0.00 –0.00 –0.01

ANM 0.01 –0.30 –0.96 –0.81 –0.91 –0.70 –0.89 –0.90 –0.71 –0.76 –0.74 –0.90

BDM 0.06 –0.24 –0.74 –0.89 –0.76 –0.59 –0.74 –0.75 –0.59 –0.63 –0.61 –0.75

CHM 0.03 –0.31 –0.92 –0.83 –0.97 –0.72 –0.92 –0.93 –0.73 –0.78 –0.76 –0.93

GVM 0.05 –0.18 –0.55 –0.50 –0.55 –0.77 –0.54 –0.55 –0.44 –0.46 –0.45 –0.55

HEM –0.03 –0.30 –0.91 –0.82 –0.93 –0.70 –0.97 –0.92 –0.72 –0.77 –0.77 –0.92

LMM 0.07 –0.30 –0.90 –0.82 –0.92 –0.71 –0.90 –0.96 –0.72 –0.76 –0.75 –0.91

MAM 0.00 –0.21 –0.67 –0.61 –0.69 –0.53 –0.67 –0.68 –0.91 –0.58 –0.58 –0.68

SAM 0.05 –0.22 –0.67 –0.61 –0.68 –0.53 –0.67 –0.68 –0.54 –0.86 –0.56 –0.68

SHM –0.04 –0.23 –0.74 –0.67 –0.76 –0.58 –0.75 –0.75 –0.61 –0.63 –0.96 –0.75

SMM 0.01 –0.31 –0.91 –0.83 –0.93 –0.71 –0.91 –0.92 –0.73 –0.77 –0.76 –0.97

64

Table 3.8. Continuation …

HM EM ANM BDM CHM GVM HEM LMM MAM SAM SHM SMM

HM 1.00

EM 0.00 1.00

ANM 0.30 0.01 1.00

BDM 0.25 0.00 0.74 1.00

CHM 0.32 0.01 0.92 0.77 1.00

GVM 0.19 0.00 0.55 0.47 0.56 1.00

HEM 0.31 0.02 0.91 0.75 0.94 0.55 1.00

LMM 0.30 0.00 0.90 0.76 0.94 0.56 0.92 1.00

MAM 0.21 0.01 0.68 0.56 0.69 0.41 0.68 0.69 1.00

SAM 0.22 0.00 0.67 0.56 0.69 0.41 0.68 0.68 0.52 1.00

SHM 0.24 0.01 0.74 0.61 0.77 0.45 0.77 0.75 0.58 0.56 1.00

SMM 0.31 0.01 0.91 0.76 0.94 0.56 0.93 0.92 0.69 0.69 0.76 1.00

a H = dominance, E = epistatic loss, AN = Angus, BD = Blond D’Aquitane,

CH = Charolais, GV = Gelbvieh, HE = Hereford, LM = Limousin, MA = Maine-Anjou,

SA = Salers, SH = Shorthorn, and SM = Simmental.

65

0

10

20

30

40

50

60

70

80

90

Calves Sires Dams

Perc

enta

ge 1 breed2 breeds3 breeds4 breeds

Figure 3.1. Percentage of calves, sires, and dams with 1, 2, 3, or 4 breeds in the genetic

composition in the dataset containing 478,466 calves, 19,908 sires, and 234,608 dams

66

020,00040,00060,00080,000

100,000120,000140,000160,000180,000

AN BD CH GV HE LM MA SA SH SM UN

Breed

Num

ber

of c

alve

sPurebred

Crossbred

0500

1,0001,5002,0002,5003,0003,5004,0004,500


Breed

Num

ber

of si

res

Purebred

Crossbred

0

10,000

20,000

30,000

40,000

50,000

60,000

70,000


Breed

Num

ber

of d

ams

Purebred

Crossbred

Figure 3.2. Number of purebred and crossbred calves, sires, and dams containing some portion of the indicated breed in dataset including 478,466 calves, 19,908 sires, and 234,608 dams

67

0

10,000

20,000

30,000

40,000

50,000

60,000

70,000

80,000


Breed

Num

ber

of c

alve

sPurebredCrossbred

Figure 3.3. Numbers of purebred and crossbred (expressed as equivalent to purebred)

calves per breed

68

Angus

-1

0

1

2

3

4

1986 1988 1990 1992 1994 1996 1998 2000

Birth year

Est

imat

ed b

reed

ing

valu

e (k

g)

Average

Regression

Charolais

-1

0

1

2

3

4

1986 1988 1990 1992 1994 1996 1998 2000

Birth year

Est

imat

ed b

reed

ing

valu

e (k

g)

Average

Regression

Hereford

-1

0

1

2

3

4

1986 1988 1990 1992 1994 1996 1998 2000

Birth year

Est

imat

ed b

reed

ing

valu

e (k

g)

Average

Regression

Limousin

-1

0

1

2

3

4

1986 1988 1990 1992 1994 1996 1998 2000

Birth year

Est

imat

ed b

reed

ing

valu

e (k

g)Average

Regression

Simmental

-1

0

1

2

3

4

5

1986 1988 1990 1992 1994 1996 1998 2000

Birth year

Est

imat

ed b

reed

ing

valu

e (k

g)

Average

Regression

Figure 3.4. Multi-breed additive genetic changes in pre-weaning gain obtained through

average estimated breeding values of purebred calves per birth year (Average) and through

regression of yearly estimated breeding values on contribution of each breed to the breed

composition of the calves (Regression)

69

Chapter 4

Estimation of genetic effects in the presence of

multicollinearity

ABSTRACT - A framework using generalized ridge regression methods was developed

for obtaining stable estimates of direct and maternal breed additive, dominance, and

epistatic loss effects when multicollinearity among predictor variables is of concern. Pre-

weaning gain of calves recorded through Beef Improvement Ontario (BIO), from 1986 to

1999, were analyzed. The genetic model included fixed genetic effects of breed,

dominance, and epistatic loss, fixed environmental effects of age of the calf,

contemporary group, and age of the dam by sex of the calf, random additive direct and

maternal genetic effects, and random maternal permanent environment effect. The degree

and the nature of the multicollinearity among predictor variables of breed additive,

dominance, and epistatic loss effects were identified and ridge regression methods were

used as an alternative to ordinary least squares (LS). Ridge parameters were obtained

using two different objective methods: Generalized ridge estimator of Hoerl and Kennard

(R1) and bootstrap in combination with cross-validation (R2). Estimates from R1 and R2

were compared to estimates from LS on the basis of mean squared error of predictions

(MSEP) and variance inflation factors (VIF), computed over one hundred bootstrap

samples. Both ridge regression methods outperformed the LS estimator with respect to

70

MSEP and VIF. MSEP of R1 and R2 were similar, which were 3% lower than the MSEP

of LS. Average VIF of LS, R1, and R2 were equal to 26.81, 6.10, and 4.18, respectively.

Ridge regression methods were particularly effective in reducing the multicollinearity

involving predictor variables of breed additive effects. Due to a high degree of

confounding between estimates of maternal dominance and direct epistatic loss effects it

was not possible to compare the relative importance of effects with a high level of

confidence. The inclusion of epistatic loss effects in the additive-dominance model did

not cause noticeable re-ranking of sires, dams, and calves based on across-breed

estimated breeding values. More stable estimates of breed effects as a result of this study

will contribute to more accurate across-breed estimated breeding values, which is an

important criterion of selection in beef cattle.

Key words: additive, bootstrap, crossbreeding, dominance, epistatic loss, genetic

evaluation, non-additive, ridge regression.

Abbreviations: ABC, across-breed estimated breeding value; CI, condition index; D,

dominance effect; E, epistatic loss effect; EBV, estimated breeding value; ED, coefficient

of direct epistatic loss effect; EM, coefficient of maternal epistatic loss effect; HD,

coefficient of direct dominance effect; HM, coefficient of maternal dominance effect;

MSE, mean square error; MSEP, mean squared error of prediction; VIF, variance

inflation factor.

71

INTRODUCTION

Breed additive, dominance, and epistatic loss effects are of concern in the genetic

evaluation of a multi-breed population. For estimating these effects, a multiple regression

equation including predictor variables such as breed composition of the calf and of the

dam, expected direct and maternal breed heterozygosities, and functions of the

heterozygosities can be used (Koch et al. 1985; Pimentel et al., 2004). Interpretation of

the estimates depends on the assumption that the predictor variables are not strongly

interrelated.

If there are strong linear relationships among predictor variables, the interpretation of

the corresponding estimates may not be valid because it is difficult to estimate the unique

effect of an individual variable in the regression equation. Typically, when strong linear

relationships exist, the regression coefficients have large standard errors, may have signs

that are opposite than would be expected, and are sensitive to changes in the data file and

to addition or deletion of variables in the model, making modeling very confusing

(Belsley, 1991). In addition, when considered in combination, the estimated coefficients

often cancel out. This problem is known as collinearity or multicollinearity (Weisberg,

1985). In the presence of multicollinearity, least squares estimates are not adequate

because they are unstable.

Various alternatives to least squares have been suggested to deal with

multicollinearity problems. One such alternative is the ridge regression, which was

introduced by Hoerl (1962) and Hoerl and Kennard (1970a, 1970b). The ridge estimators

are biased, but might be useful in providing estimates that are more precise and, therefore,

more stable than least squares estimates when multicollinearity is of concern.

The ridge estimator is obtained by solving the system of equations

72

(X�X + kI) kb = X�y

to give

kb = (X�X + kI)–1 X�y,

where k is the ridge parameter or perturbation constant, with k > 0, and I is an identity

matrix. In a generalized form, kI is replaced by a matrix K, where K = diag (k1 k2 … kp),

with ki � 0. Many methods have been proposed in the literature for selecting appropriate

ki values (Gruber, 1998), but there is no consensus of which method is the most adequate.

In general, the best method to estimate an optimal K depends on the data and model used.

From a Bayesian viewpoint (Goldstein and Smith, 1974; Draper and Smith, 1998;

Sorensen and Gianola, 2002), the ridge regression can be considered as an estimate of b

from the data subject to prior knowledge about the parameter, which is supplemented by

the ridge parameter k. Given that k = �2/ 2b� , where �2 is the residual variance and 2

b� is a

measure of the spread of the elements of b, large values of k imply an a priori belief that

more restricted values of b are more likely than larger values, while small values of k

imply an a priori belief that quite large range of values of the b are not unreasonable.

The objective of this study was to develop a framework using ridge regression

methods for obtaining stable estimates of direct and maternal breed additive, dominance,

and epistatic loss genetic effects in the presence of multicollinearity.


Data

The data were pre-weaning weight gain of animals from beef herds enrolled with

Beef Improvement Ontario (BIO), from 1986 to 1999. The dataset after preliminary edits

73

consisted of 869,050 records, including records of both purebred and crossbred animals.

A subset including purebred and crossbred animals from the 10 most common breeds

(Angus, Blond D’Aquitane, Charolais, Guelbvieh, Hereford, Limousin, Maine-Anjou,

Salers, Shorthorn, and Simmental), containing 478,466 records was chosen for the

analysis. Portions of undetermined breed in the breed composition were treated as another

breed, named Unknown (UN). A summary of the data is presented in Chapter 3.

Predictor variables of fixed genetic effects

(1) Breed additive effects

Coefficients of direct and maternal breed additive effects were equal to the genetic

contribution of each breed to the breed composition of the calf and to the breed

composition of the dam, respectively. Linear dependencies among breed additive effects

required a restriction to obtain these estimates. The estimates for direct and maternal

breed additive effects were reported relative to the Angus breed.

(2) Dominance effects

Coefficients of direct dominance (HD) and maternal dominance (HM) genetic effects

were equal to expected direct and maternal breed heterozygosities, respectively. HD and

HM were calculated using the following equations:

HD = 1 – nb

1=iSi × Di

and

HM = 1 – nb

1=iMGSi × MGDi,

74

where nb is the number of breeds (11), and Si, Di, MGSi, and MGDi are the fractions of

the ith breed for the sire, dam, maternal grandsire, and maternal granddam breed

composition, respectively.

(3) Epistatic loss effects

For estimating epistatic loss effects, the parents of an individual were assumed to

produce more recombinant gametes the larger their breed heterozygosities. Thus, the

coefficients of epistatic loss effects for direct (ED) and maternal (EM) effects were

calculated as the average breed heterozygosities in uniting gametes that generated the

individual (Fries et el., 2000). ED and EM were calculated as:

ED = 0.5 (HSire + HDam)

and

EM = 0.5 (HMGS + HMGD),

where HSire, HDam, HMGS, and HMGD are the expected breed heterozygosities of the sire,

dam, maternal grandsire, and maternal granddam, respectively.

Multicollinearity diagnostics

For identifying possible linear dependencies between the covariates included in the

analysis, different measures of the degree of multicollinearity were obtained.

(1) Variance inflation factor

The variance inflation factor is the most popular measure of multicollinearity. If Ri2 is

the coefficient of determination resulting when the predictor variable Xi is regressed on

75

all the remaining predictors variables, the variance inflation factor for Xi (VIFi) is given

by

VIFi =)R–(1

12i

.

In the ordinary least squares (LS) the VIFs are the diagonal elements of the inverse of

the simple correlation matrix. The VIF indicate the inflation in the variance of each

regression coefficient over a situation of orthogonality. The magnitude of a VIF to be

considered high is essentially arbitrary. Usually, values larger than 10 suggest that

multicollinearity may be causing estimation problems (Chatterjee et al., 2000).

(2) Condition index

In the presence of multicollinearity, the determinant of the correlation matrix is very

low. Because the determinant is also equal to the product of eigenvalues �i, the presence

of one or more small eigenvalues results in a small determinant, thereby indicating

multicollinearity. A measure of multicollinearity called condition index (CI) is obtained

for each eigenvalue by

CIi = i

max�

�

where �max is the largest eigenvalue and �i is the ith eigenvalue of the correlation matrix.

Large CIi indicates dependencies among covariates, since �i will be close to zero. Belsley

(1991) suggests that a CI between 10 and 30 are of interest, indicating possible problems

of multicollinearity and CI larger than 30 provide reasonable evidence of considerable

multicollinearity.

76

(3) Variance-decomposition proportions associated with the eigenvalues

This statistic indicates which variables are involved in linear dependencies and how

much of the variance of the parameter estimate is associated with each eigenvalue.

Following Belsley (1991),

Var ( b ) = �2 (X�X)–1 = �2 V� –1V�,

where �2 is the residual variance estimate, V is a matrix containing the eigenvectors, and

� is a diagonal matrix of eigenvalues, diag (�1 �2 … �p). Writing V = vij, the variance of

the ith component of the regression coefficient vector b can be decomposed in to a sum of

components, each associated with one eigenvalue, as

Var ( kb ) = �2

j

2ij

p

1=j �

v, where p is the number of predictor variables.

Because the eigenvalues appear in the denominator, those components of the variance

associated with dependencies (small �j) will be relatively large compared to the other

components. Thus, a high proportion of two or more coefficients associated with the same

small eigenvalue provides evidence that corresponding dependencies are causing

problems.

Let tkj = j

2ij

�

v and ti =

p

1=jijt , with i = 1, … , p. The proportion of the variance of the ith

regression coefficient associated with the jth component of its decomposition is obtained

as

�ji = i

ij

t

t, with i, j = 1, … , p.

77

An approach recommended by Belsley et al. (1980) is to identify eigenvalues �j

having CI greater than 30. Variables with variance-decomposition proportions �ji larger

than 0.5 for each of these eigenvalues are candidates for linear dependencies. The

measures of multicollinearity were obtained using the regression procedure, option

COLLINOINT, of the SAS statistical software (SAS Institute Inc., 1990).

Genetic analysis

The genetic model for pre-weaning gain, defined in matrix notation, was:

y = Xb + Fv + Za + Wm + Sp + e, (1)

where

y = vector of observations;

b = vector of fixed genetic effects. This vector included direct and maternal breed

additive, dominance, and epistatic loss effects;

v = vector of fixed environmental effects. This vector included age of the calf as a

covariate (linear and quadratic effects), and age of the dam by sex of the calf and

contemporary group (herd-year-season-management group) as classification variables;

a = vector of random direct additive genetic effects;

m = vector of random maternal additive genetic effects;

p = vector of random maternal permanent environment, and

e = vector of random residual effects.

X, F, Z, W, and S are incidence matrices relating records to fixed genetic, fixed

environmental, direct genetic, maternal genetic, and permanent environment effects,

respectively.

78

Random effects a, m, p, and e were assumed to have variance matrices equal to A�a2,

A�m2, I�p

2, and I�e2, respectively, where A is the additive numerator relationship matrix

among animals and I is an identity matrix. Covariance between a and m was assumed

equal to A�am. The estimates of �a2, �m

2, �am, �p2, and �e

2 used in the analyses were those

reported in Chapter 3. Homogeneity of variances and the same dominance and epistatic

loss effects for crosses of different pairs of breeds, and no interactions between genetic

and environmental effects were assumed.

The solutions for the genetic model (1) were obtained through the following

procedure:

Step 1: Obtain solutions for v, a, m, and p, using the model

y1 = Fv + Za + Wm + Sp + e1,

where y1 = y – X b . In the first iteration b was set to values obtained by LS. The DMU

program (Madsen and Jensen, 2000) was used.

Step 2: Using the ridge regression technique, obtain solutions for b, using the model

y2 = Xb + e2,

where y2 = y – F v – Za – Wm – Sp , and v , a , m , and p are solutions obtained in

the first step. The programs to run step 2 were developed using the Fortran language and

the IML procedure of SAS statistical software (SAS Institute Inc., 1990).

Steps 1 and 2 were repeated until convergence. The convergence was attained

when the largest absolute difference between the solutions in b in the current and in the

previous iteration was smaller than 10–4. The final estimates for b obtained in the second

step are equal to estimates obtained in a model where all effects are solved

simultaneously. However, the standard errors of the estimates are smaller because the

79

number of parameters estimated in the sub-model is smaller than would be in the full

model.

Ridge regression

The usual model for a multiple linear regression is

y = Xb + �,

where y is a (n × 1) vector of observations, X is a (n × p) design matrix of rank p, and �

is a (n × 1) vector of random residuals with assumptions E(�) = 0 and E(��) = I�2. The

unknown parameter vector, b, using the least squares criterion, is estimated by solving

(X�X) b = X�y to give b = (X�X)–1 X�y. Estimates and corresponding variances could be

unreliable in the presence of multicollinearity. The ridge regression estimator consists of

adding a small positive amount on the diagonal of the X�X matrix, causing a reduction in

the variance of the estimates at the expense of introducing some bias. Thus, the ridge

regression estimator of b takes the general form

kb = (X�X + K)–1X�y,

where K = diag (k1 k2 … kp), ki � 0. When all ki elements are equal to zero, kb reduces

to the LS estimator.

The variance-covariance matrix of kb is

Var( kb ) = (X�X + K)–1X�X(X�X + K)–1�

2.

The mean square error (MSE), which is a measure of the expected squared distance

of kb to b, is

MSE = E[( kb – b)� ( kb – b)] = trace [Var( kb )] + b�(Z – I)�(Z – I)b.

80

MSE = Total variance + (Bias)2,

where Z = (X�X + K)–1X�X.

The variance inflation factors of the ridge regression coefficients are diagonal

elements of the matrix (X�X + K)–1X�X(X�X + K)–1.

In the present study, the ridge regression analyses were carried out in the standardized

form of the model using the correlation matrix. After estimation, the estimates were

transformed to the original scale and were presented in this way.

Objective methods for selecting the ridge parameter K

The optimal value of the ridge parameter K, which results in smaller MSE than that

obtained with LS, depends on the unknown parameter vector b and the unknown error

variance �2 (Hoerl and Kennard, 1970a). As a consequence, K must be determined

empirically or estimated from the data. In this study the ridge parameter K was estimated

through two objective methods.

(1) Generalized Ridge Estimator of Hoerl and Kennard (R1)

In the Generalized Ridge Regression Estimator of Hoerl and Kennard (Hoerl and

Kennard, 1970a), an orthogonal transformation V is applied to reduce X�X to a diagonal

matrix. We have that

V(X�X)V� = �,

where V is a (p × p) orthogonal matrix whose columns v1, v2, … ,vp are the eigenvectors

of X�X and � is a diagonal matrix of eigenvalues of X�X. Writing

X* = XV�

and

81

� = V�b,

then the model y = Xb + � can be written as

y = X*� + �,

where

(X*)�(X*) = �.

The generalized ridge regression procedure is then defined as

k� = [(X*)�(X*) + K]–1 (X*)�y,

where

K is a diagonal matrix with non-negative diagonal elements k1, k1, … , kp.

Hoerl and Kennard (1970a) showed that theoretical optimal values for ki are given by

ki = �2/ 2i� . The authors suggested an iterative procedure to estimate ki. This procedure

may be summarized as follows.

1. Reduce the system to canonical form.

2. Take the least squares as the starting point to compute )j(ik = �2/ 2)0(i� , i = 1, 2, … ,

p, where �2 is the LS estimator of �2 and j denotes the jth iteration.

3. Use the )j(ik values in the ridge regression equation to obtain )1+j(i� .

4. Compute a new estimate for ki using )1+j(ik = �2/ 2)1+j(i� .

Go to step 3 until convergence of ik . The convergence was achieved when the

difference between ik ’s of two successive iterations were smaller than 10–7. After

convergence the estimates k� were converted back to kb through the equation

kb = V k� .

82

(2) Bootstrap in combination with cross-validation (R2) The bootstrap and cross-validation for estimating the ridge parameter, originally

suggested by Delaney and Chatterjee (1986), was extended to consider the instability of each predictor variable. The elements ki of the ridge parameter K were estimated by

ik = �b )VIF(max

VIFi ,

where VIFi is the variance inflation factor of the ith predictor variable. A value �b has to

be chosen to generate a K matrix that minimizes the mean squared error of prediction

(MSEP). The magnitude of the elements ik of the ridge parameter matrix K will be

proportional to the variance inflation of each predictor variable. As a result, unnecessary

bias will not be imposed to those predictors not seriously involved in multicollinearity.

The MSEP was estimated combining bootstrap with cross validation (described later).

The bootstrap is a powerful resampling procedure originally proposed by Efron (1979). In

the bootstrap procedure, a random sample of n observations with replacement is taken for

a particular population. This sample will contain observations that were chosen more than

once and observations that were not chosen. The sample obtained in this manner is known

as bootstrap sample. If a large number of bootstrap samples is performed, the estimates of

the parameters of interest will approach the true parameter.

A strategy using bootstrap in combination with cross-validation to estimate the ridge

parameter matrix K can be summarized as follows.

1. Select a vector � containing values of � between 0 and 1.

2. Choose a bootstrap sample of n observations with replacement.

83

3. For each bootstrap sample and each value of � obtain K and the ridge estimator

vector kb , where K = diag( 1k 2k … pk ). Use the ridge estimator to predict

observations that were not chosen in the bootstrap sample. If the prediction vector for the

unchosen observations is ky ( K ), the MSEP of the jth bootstrap sample and K ridge

parameter, given �, is

MSEPj( �K ) = [ ky ( �K ) – y]� [ ky ( �K ) – y] / Nj,

where Nj is the number of unchosen observations (randomly determined) in the jth

bootstrap sample.

4. Repeat steps 2 and 3 for B bootstrap samples and obtain a final average of MSEP

for each � value as

MSEP( �K ) = B

1=jj

j�

B

1=jj

N

)N()]ˆ(MSEP[ K

.

A value of � that generates a matrix K of ridge parameters that minimizes the MSEP

is then chosen (�b). The MSEP were obtained for values of � ranging from zero to one,

with increments of 0.001, on the basis of one hundred bootstrap samples with

replacement.

Mean squared error of prediction and variance inflation factor

After selecting the ridge parameter matrix K for each ridge regression method and to

obtain the ridge regression estimates, one hundred bootstrap samples with replacement

were generated and used for computing average MSEP and VIF. Ridge regression

84

methods and LS were compared with respect to average MSEP and VIF. A model that

results in lower VIF and smaller MSEP is desirable because these statistics indicate

stability in the estimates and ability of the model to predict future observations.

Bias measurement

Given that E( b ) = b and E( kb ) = (X�X + K)–1 X�Xb = Hb, a measurement of the bias

of the ridge regression vector kb was obtained by 1 – ||||||||

IH

× 100, where || || denotes the

Euclidean norm. Thus, a bias measurement close to zero for a particular ridge regression

method will indicate little bias in the estimates.

Comparison of across-breed estimated breeding values

Across-breed estimated breeding values (ABC) models that used LS and ridge

regression methods for estimation of fixed genetic effects were compared through

correlations (Pearson and Spearman), and percentages of coincidence for different

proportions of selected (top 1%, 10%, 20%, and 40%) sires, dams, and calves. ABC were

calculated by adding EBV and estimates of direct breed additive effects, weighted by the

breed composition of the animal. The following models were considered in the

comparison.

(1) Additive-dominance models

AD-AH: The pre-weaning gain was pre-adjusted for expected heterosis based on

averages from literature. An ad hoc heterosis (direct and maternal) of 5% for an animal

85

with heterozygosity of 100% was assumed. Breed additive effects were estimated through

LS.

AD-LS: The pre-weaning gain was adjusted for dominance (heterosis) effects using

information from the dataset under investigation. Breed additive and dominance effects

were estimated through LS. Estimates of direct and maternal heterosis were equal to

1.31% and 1.84%, respectively.

AD-R2: This model differed from model AD-LS by the fact that breed additive and

dominance (heterosis) effects were estimated through R2 instead of LS. Estimates of

direct and maternal heterosis were equal to 1.22% and 1.23%, respectively.

(2) Additive-dominance-epistatic models

ADE-LS: Breed additive, dominance and epistatic loss effects were estimated using

LS. Estimates of direct and maternal dominance were equal to 1.31% and 2.28%,

respectively, whereas estimates of direct and maternal epistatic loss were equal to –2.19%

and –0.08%, respectively.

ADE-R1: Breed additive, dominance and epistatic loss effects were estimated using

R1. Estimates of direct and maternal dominance were equal to 1.31% and 1.72%,



ADE-R2: Breed additive, dominance and epistatic loss effects were estimated using

R2. Estimates of direct and maternal dominance were equal to 1.23% and 1.55%,



86

ABC from model ADE-R2 were assumed as the reference estimates for calculating

Pearson and Spearman correlations, and percentages of coincidence with all other models.

RESULTS

Multicollinearity diagnostics

The matrices X�X and X�y in the correlation form are presented in Table 4.1.

Coefficients of maternal dominance and direct epistatic loss effects were strongly

correlated (r = 0.95). Looking within a breed, coefficients of direct and maternal breed

additive effects were always equal to or higher than 0.80. The severity of

multicollinearity, however, should not be quantified by the magnitude of these pairwise

correlations because the interrelation among three or more variables might result in a high

degree of multicollinearity, even when pairwise correlations are low. Better measures of

the degree of multicollinearity are given by the eigenvalues of the correlation matrix and

corresponding condition indices (Table 4.2), variance inflation factors (Figure 4.1), and

variance-decomposition proportions associated with the eigenvalues (Table 4.3).

Eigenvalues and corresponding condition indices are presented in Table 4.2. The last

eigenvalue was very small (� = 0.00189). This eigenvalue was associated with condition

index 38.85, reflecting dependencies between predictor variables. The second smallest

eigenvalue was equal to 0.05078, with corresponding condition index equal to 7.50.

Variance inflation factors shown in Figure 4.1 indicate that the variance of the LS

estimates of 16 out of 24 regression coefficients would be inflated by more than 10 fold

(VIF larger than 10) compared to what would be expected in an orthogonal system.

Variance-decomposition proportions associated with the largest condition index (CI =

38.85) suggests that breed composition was the main candidate for the dependencies

87

(Table 4.3). For 9 direct and 5 maternal breed additive effects, a fraction of the variance

of the estimated regression coefficients larger than 50% was associated with

dependencies indicated by the largest condition index.

Combining information from Table 3.7 and Figure 3.3 (Chapter 3) with information

from Table 4.3 breeds with smaller number of records and higher standard errors for the

estimated regression coefficients, which included BD, GV, MA, SA, and SH, had lower

proportion of the variance of the estimates associated with linear dependences among

predictor variables. On the contrary, breeds with larger number of records and lower

standard errors for the estimated regression coefficients, including AN, CH, HE, LM, and

SM, had higher proportion of the variance of the estimates associated with linear

dependences among predictor variables.

The second largest condition index (CI = 7.50) points out possible dependencies

involving maternal dominance and direct epistatic loss effects (Table 4.3). 85% and 83%

of the variances of estimated regression coefficient of maternal dominance and direct

epistatic loss effects were associated with linear dependences between the corresponding

predictor variables.

Ridge parameter K

The ridge parameters obtained by the two objective methods are shown in Table 4.4.

The selected constant �b for calculating the ridge parameter K in R2, which minimizes the

MSEP, was equal to 0.04. The average and the standard deviation of the number of

unchosen observations over the bootstrap samples in the last iteration for solving the

genetic model (1) were equal to 176,002 and 257, respectively. The elements of the ridge

88

parameter K obtained on the basis of R1 were generally smaller than those on the basis of

R2.

Convergence of estimates of fixed genetic effects

Estimates of direct and maternal breed additive, dominance, and epistatic loss effects

after each iteration for solving the genetic model (1), under both ridge regression

methods, are depicted in Figures 4.2 and 4.3, respectively. To minimize the number of

iterations needed to achieve convergence, the least squares estimates were used as the

starting point for R1 and R2. Estimates of fixed genetic effects converged faster under R2

(60 iterations) than under R1 (135 iterations). The slower convergence of R1 was due to

direct and maternal breed additive effects of SA. In general, estimates of direct and

maternal effects moved in opposite directions in the first iterations before stabilizing.

Mean squared error of prediction and variance inflation factor

Table 4.5 shows the average MSEP and VIF obtained over one hundred bootstrap

samples under LS and ridge regression methods. Both ridge regression methods

outperformed the LS estimator with regard to MSEP and VIF. The MSEP of the two ridge

regression methods were similar, which were 3% lower than the MSEP of LS. Ridge

regression methods were also superior to LS when compared with respect to reduction in

VIF. The average VIF of LS estimates was equal to 26.81. This value was reduced to 6.10

and 4.18 by R1 and R2, respectively. In the R1, only two regression coefficient estimates

still had VIF larger than 10, whereas in R2 all VIF were lower than 10.

The last two rows in Table 4.5 are the total variance and the square of the estimates.

Methods used to deal with multicollinearity problems typically generate predictors with

89

smaller variance and smaller range of the predictor vector in comparison to the LS

estimator. These two statistics are influenced by the magnitude of the ridge parameter.

Small ridge parameters imply less restriction (shrinkage) on the size of regression

coefficients, while large values of ridge parameters imply an a priori belief that estimates

of regression coefficients should be smaller or restricted.

From Table 4.5 the two ridge regression methods provided a general improvement

over the LS, when evaluated by MSEP and average VIF obtained over a large number of

bootstrap samples. Additional information for comparing the ridge regression methods,

based on reduction of instability of each parameter estimate, is presented in Figure 4.4.

When multicollinearity was of concern both ridge regression methods caused substantial

reduction in the VIF, but VIF given by R2 were smaller than VIF given by R1 for most

predictor variables.

Bias measurement

A known relationship between the ridge parameter and both variance and bias of the

ridge regression estimates is that, as the ridge parameter increases, the variance decreases

and the bias increases. The ridge parameters obtained on the basis of R2 were generally

larger than the ridge parameters obtained on the basis of R1 (Table 4.4). As a

consequence, larger bias in the estimates of regression coefficients of R2, compared to

R1, can be expected. Bias measurements of R1 and R2 were equal to 1.49 and 5.61%,

respectively.

90

Dominance and epistatic loss effects

Estimates of dominance and epistatic loss effects and respective standard errors

obtained by LS and by ridge regression methods are presented in Table 4.6. For ease of

comparison, estimates and corresponding standard errors are also displayed in Figures 4.5

and 4.6, respectively. Estimates of direct and maternal dominance effects were obtained

as partial regressions on predictor variables HD and HM, while estimates of direct and

maternal epistatic loss effects were obtained as partial regressions on predictors ED and

EM, respectively. Estimates of dominance and epistatic loss effects were of opposite sign.

Both direct and maternal dominance effects resulted in a favourable effect on pre-

weaning gain. Direct and maternal epistatic loss effects decreased the pre-weaning gain.

The estimate of maternal epistatic loss, however, was statistically not different from zero

(P > 0.05).

Because predictor variables HM and ED were involved in multicollinearity, ridge

regression methods caused substantial changes in the estimates of maternal dominance

and direct epistatic loss effects. A small reduction in the standard errors of estimates of

maternal dominance and direct epistatic loss effects was obtained through ridge regression

methods. This reduction was slightly more pronounced under the R2 method.

Breed additive effects

Estimates and standard errors of direct and maternal breed additive effects, as

deviations from AN, are presented in Table 4.7. Estimates and standard errors of direct

and maternal breed additive effects are also depicted in Figures 4.5 and 4.6, respectively.

Estimates of direct breed additive effects showed large standard errors under LS

(Figure 4.5). Ridge regression methods substantially reduced the standard errors when

91

predictor variables were more associated with multicollinearity. In the extreme case of

multicollinearity pointed out by the largest VIF in Figure 4.1, which corresponds to the

HE breed, the standard error of the estimate of direct breed additive genetic effect was

reduced from 0.63 in the LS to 0.20 in the R1 and to 0.10 in the R2.

Estimates of maternal breed additive effects had a different pattern in comparison to

direct effects. The ridge regression estimates of maternal breed effects of GV, HE, MA,

and SM were of larger magnitude than ordinary least square estimates (Figure 4.6). The

standard errors given by ridge regression methods, however, were always smaller than

standard errors given by LS. Increasing the ridge parameter K indefinitely in the ridge

regression analysis will force all coefficients to zero, but for small values of ki it is not

uncommon to see a regression coefficient increase in absolute value as ki increases

(Marquardt and Snee, 1975).

Figures 4.5 and 4.6 show that estimates of direct and maternal breed additive effects

of BD, GV, MA, SA, and SH still had relatively large standard errors under ridge

regression methods in comparison to the remaining breeds. It was previously shown,

however, that variance-decomposition proportions of maternal breed additive effects for

BD, GV, MA, SA, and SH associated with the largest condition index were lower than

0.5 (Table 4.3). Thus, the large standard errors of the estimates of maternal breed effects

for BD, GV, MA, SA, and SH are more likely a consequence of the relatively small

number of observations in these breeds rather than due to multicollinearity involving the

corresponding predictor variables.

The use of ridge regression methods caused changes in the contrasts between

estimates of breed effects, which will ultimately be reflected in how breeds rank. Because

breed estimates are part of across-breed estimated breeding values used as a criterion of

92

selection in the beef industry, the option for a particular model may have practical

implications.

Sampling correlations

For obtaining information with regard to degree of confounding between estimates

given by LS, R1, and R2, sampling correlations among estimates of breed additive,

dominance, and epistatic loss effects were calculated. Overall averages of absolute values

of pairwise correlations between estimates under LS, R1, and R2 were equal to 0.49, 0.30,

and 0.18, respectively. These correlations indicated a substantial reduction in the degree

of overall association between estimates given by ridge regression methods, especially

with R2, comparatively to LS. The reduction in the degree of association between

estimates was more pronounced between estimates of direct and maternal breed effects

involving different breeds than between direct and maternal breed effects for the same

breed.

Figure 4.7 shows correlations between estimates of maternal dominance and direct

epistatic loss effects and between direct and maternal breed additive effects for the same

breed. Averages of these correlations were equal to –0.88, –0.79, and –0.74 under LS, R1,

and R2, respectively. Under ridge regression methods, breeds more involved in

multicollinearity showed a substantial reduction in the degree of confounding between

estimates of direct and maternal breed additive effects, noticeably under R2. On the

contrary, estimates of maternal dominance and direct epistatic loss effects were still

highly correlated under both ridge regression methods. The correlation between estimates

of maternal dominance and direct epistatic loss effect was 0.94 in the LS and 0.93 in both

ridge regression methods.

93

Comparison of across-breed estimated breeding values

Comparisons of ABC from additive-dominance models AD-AH, AD-LS, AD-R2,

and the additive-dominance-epistatic models ADE-LS and ADE-R1 with ABC from

additive-dominance-epistatic model ADE-R2 with respect to Pearson and Spearman

correlations, and percentages of coincidence for different proportion of selected sires,

dams, and calves are depicted in Figure 4.8. Overall Pearson and Spearman correlations

between ABC were high, ranging from 0.85 to 1.0. ABC from model AD-R2 were

perfectly correlated with ABC from model ADE-R2. Even when only 1% of top animals

were compared on the basis of ABC, percentages of coincidence between AD-R2 and

ADE-R2 were equal to or higher than 0.99. Both models AD-R2 and ADE-R2 used ridge

regression method R2 for estimating the fixed genetic effects, but model AD-R2 did not

include epistatic loss effects. Thus, the inclusion of epistatic loss in the genetic model did

not cause re-ranking of sires, dams, and calves. This observation is also corroborated by

the fact that Pearson and Spearman correlations and percentages of coincidences of

models AD-LS and ADE-LS with model ADE-R2 were very similar. Fixed genetic

effects of both models AD-LS and ADE-LS were estimated using LS, but they differed

by the fact that model AD-LS did not include epistatic loss effects.

When the highest 1% ABC of sires, dams, and calves under model ADE-R2 and

under models AD-AH, AD-LS, ADE-LS, and ADE-R1 were compared, percentages of

coincidence were much lower than the overall Pearson and Spearman correlations,

especially in the model AD-AH (0.66, 0.65, and 0.61, for sires, dams, and calves,

respectively). These results point out important re-ranking of top animals. Among the 1%

best sires, dams, and calves, 34, 35, and 39% selected based on ADE-R2 would not be

selected based on model AD-AH, which assumed an ad hoc heterosis of 5% for both

94

direct and maternal effects and did not account for multicollinearity among predictor

variables. As the percentage of selected animals under model AD-AH increased to 0.40,

the percentages of coincidence with model ADE-R2 increased to 0.78 for sires, 0.81 for

dams, and 0.78 for calves. Higher percentages of coincidence between models AD-AH

and AD-LS (not shown) than between models AD-AH and ADE-R2 suggested that

practical differences between models AD-AH and ADE-R2 were predominantly from

differences in breed additive effects rather than different non-additive effects. When 1%,

10%, 20%, and 40% highest calves’ ABC under models AD-AH and AD-LS were

compared, percentages of coincidence were 0.81, 0.87, 0.92, and 0.93, respectively.

Models AD-LS and ADE-LS were similarly correlated with model ADE-R2.

Compared to model AD-AH, models AD-LS and ADE-LS had larger percentages of

coincidence with model ADE-R2. However, the difference with model ADE-R2 was still

substantial. Among the 1% highest ABC, approximately 30% of selected animals, based

on ADE-R2, would not be selected based on models AD-LS and ADE-LS. Among the

40% highest ABC under model ADE-R2, approximately 20% of selected animals would

not be selected based on models AD-LS and ADE-LS. Model ADE-R1 showed a larger

percentage of coincidence with model ADE-R2 than models AD-AH, AD-LS, and ADE-

LS, but differences with ADE-R2 were still considerable. These results confirm that the

choice of the method to estimate the ridge parameter has consequences to genetic

selection, resulting in different ranking of animals on the basis of across-breed estimated

breeding values.

95

Use of the same ridge parameter in subsequent genetic evaluations

The inclusion of a small bias in the estimates of fixed genetic effects through the

ridge parameter K of both ridge regression methods resulted in smaller average MSEP

and VIF comparatively to LS. Because the optimal value of the ridge parameter K

depends on the unknown parameter vector b and unknown error variance �2, in practice K

must be determined empirically or estimated from the data. In this study, K was

determined from the data using two different objective methods. Estimation of K in the

same frequency that genetic evaluations are commonly run might increase computational

demand when very large datasets are used. Thus it would be worth investigating whether

the same ridge parameter could produce stable estimates of fixed genetic effects in

subsequent genetic evaluations, when more records are added to the data file. This is

equivalent to investigating how much K changes when more records are added to the data

file.

A simulation of data accumulation was performed. The ridge parameter K was

determined using records from 1986 (first year with available records) to 1996 and was

used in the estimation of fixed genetic effects when records of subsequent years (1997,

1998, and 1999) were added to the data file. Ridge regression methods and LS were then

compared with respect to stability of estimates over years. Table 4.8 presents the number

of calves from 1986 to the mentioned year, expressed as equivalent purebred calves. The

percentages of the total number of calves per breed in 1996, which were used to

determine the ridge parameter, varied from 58.87% (GV) to 95.74% (HE). The number of

calves showed large variation among breeds. GV and HE represented the smallest and the

largest number of calves, respectively. The ridge parameter K obtained by the two

objective methods, using records from 1986 to 1996, is shown in Table 4.9. The ridge

96

parameter determined by R2 did not greatly differ from the ridge parameter determined

using the entire dataset (Table 4.4). However, noticeable changes in the ridge parameter

determined by R1 were observed, particularly for predictor variables of maternal breed

additive effects of SA, GV, MA, and SM. An attempt was made to estimate the ridge

parameter using records from 1986 to 1995, but maternal breed effect of GV did not

converge with R1 after 200 iterations, likely due to the small number of records for this

breed.

Estimates of breed additive, dominance, and epistatic loss genetic effects under LS,

R1, and R2, from 1996 to 1999, were used to construct Figures 4.9, 4.10, and 4.11,

respectively. Estimates given by both ridge regression methods were less sensitive to

inclusion of new records than estimates given by LS, regardless of the fact that the ridge

parameter was determined using records from 1986 to 1996. Estimates under R2 were

more stable across years than estimates under R1.

DISCUSSION

Breed additive, dominance, and epistatic loss effects are generally estimated using a

multiple regression equation, where breed composition, breed heterozygosities and

functions of heterozygosities can be used as predictor variables. Linear relationships

among predictor variables used to estimate breed additive, dominance, and epistatic loss

effects result in multicollinearity. As a consequence, it is difficult to estimate the unique

effect of an individual variable in the regression equation and regression coefficients are

highly unstable. A potential approach to deal with multicollinearity problems is the ridge

regression method.

97

In the current investigation, multicollinearity diagnostics were performed using

different measures: variance inflation factor, condition indices, and variance-

decomposition proportions associated with eigenvalues. These measures of

multicollinearity were obtained after standardization (centering and scaling) of predictor

variables, as recommended by Marquardt and Snee (1975) and Freund and Littell (2000).

Scaling the predictor variables removes the near dependencies that are due to the scales

on which predictor variables were expressed rather than a real defect of the data, while

centering the predictor variables removes the correlation between the constant term and

all linear terms in a linear model.

Multicollinearity diagnostics suggested that direct and maternal breed effects were

the main candidates for linear dependencies, followed by maternal dominance and direct

epistatic loss effects. The multicollinearity involving breed composition can be partially

explained by the mathematical constraint among breeds, because breed portions in the

breed composition of an animal add to one and the breed composition of a calf is equal to

the average breed composition of the sire and of the dam. In practice, after fitting breeds

that are more representative in the data, less new information is added by fitting the

remaining breeds. Similarly, after fitting the breed of the dam, less new information is

added fitting the breed of the calf, and vice-versa. The other possible multicollinearity

problem, which involved coefficients for maternal dominance effects and direct epistatic

loss effects, can be a consequence of the small proportion of crossbred sires, as shown in

Chapter 3.

Ridge regression models that add the same amount to the diagonal of the matrix X�X

are known in the literature as ordinary ridge regression. Previous analysis using different

ordinary ridge regression methods described by Gruber (1998) resulted in small reduction

98

in the variance inflation factors and similar MSEP to LS (not reported), in line with

Delaney and Chatterjee (1986). These authors stated that the ordinary ridge regression

model is not appropriate for multicollinearity caused by physical or mathematical

constraints in the data. Because breed composition sums to one for each observation, a

mathematical constraint was present in the data. Generalized ridge regression methods are

advised to deal with this source of multicollinearity.

With an optimal choice of the ridge parameter matrix K, the ridge estimators have

smaller individual and total MSE than the LS estimators (Hoerl and Kennard, 1970a;

Lowerre, 1974; Gruber, 1998). The optimal K, however, cannot be determined with

certainty because it depends on the unknown parameter vector b and the unknown error

variance �2. As a consequence, K must be determined empirically or estimated from the

data. Thus, K could change as data are accumulated over time.

The performance of ridge regression methods has been generally evaluated in terms

of reduction in MSE compared to LS using computer simulation (Gruber, 1998). A given

simulation, of course, cannot hope to cover a large range of practical situations,

particularly when a large number of factors are involved. In this study, the performance

of ridge regression methods was evaluated in terms of MSEP, as in Delaney and

Chatterjee (1986) and Hébel et al. (1993), under the assumption that smaller MSE will

result in smaller MSEP. A procedure combining bootstrap resampling and cross-

validation was used to obtain the average MSEP over a large number of samples. This

procedure is supported by maximum likelihood principles because sample statistics based

on a large number of bootstrap samples tends to approach the true parameter (Delaney

and Chatterjee, 1986). VIF of the estimates under ridge regression methods and LS were

also obtained and used to evaluate the performance of ridge regression methods.

99

Results obtained by LS and by ridge regression Methods 1 and 2 were compared

through average MSEP and VIF obtained over one hundred bootstrap samples. The use of

ridge regression resulted in smaller MSEP and VIF than LS. Both ridge regression

methods had similar MSEP (3% lower than under LS), suggesting that specific linear

combinations of estimated regression coefficients were equally determined, even though

individual coefficients differed between methods (Belsley, 1991).

Average VIF given by ridge regression methods 1 and 2 were 77% and 84%

respectively lower than average VIF given by LS. Therefore, larger bias in the estimates

of R2 was compensated for by a substantial reduction in the variance of the estimates.

Consequently, estimates obtained by ridge regression methods, notably by R2, will be

less sensitive to small changes in the dataset, such as inclusion of new observations. This

expectation was confirmed when ridge parameters determined using records from 1986 to

1996 were used in the estimation of fixed genetic effects of subsequent years. The

estimates of breed additive, dominance, and epistatic loss effects under ridge regression

methods were more stable over the years than estimates under LS. This observation has

practical implications in routine genetic evaluations: First, more consistency in the across-

breed estimated breeding values can be expected in successive genetic evaluations, which

can foster more confidence among producers in the genetic improvement program, and,

second, the ridge parameter can be estimated less often than genetic evaluations are run,

decreasing the computational demand.

Estimated direct and maternal dominance effects were favourable, whereas direct and

maternal epistatic loss decreased the pre-weaning gain. Maternal epistatic loss effects,

however, were statistically not different from zero. Dominance effects, represented by

coefficient HD and HM, indicate deviation from average dominance within breed due to

100

differences in gene frequencies between breeds (breed heterozygosity). Coefficients of

epistatic loss effects, ED and EM, express the recombination loss due to breed

heterozygosity in relation to F2 calves and F2 dams, respectively. According to Koch et

al. (1985), long-term selection within a breed can increase frequencies of favourable non-

allelic combinations, which result in favourable effects on phenotype. When breeds are

crossed, random recombination of loci in the progeny tends to reduce the frequencies of

these parental breed combinations towards Hardy-Weinberg equilibrium, resulting in

recombination loss.

Estimated maternal dominance and direct epistatic loss effects were of opposite sign

and comparable magnitude, and had large standard errors under LS. Both ridge regression

methods R1 and R2 seemed to slightly alleviate the multicollinearity involving maternal

dominance and direct epistatic loss effects. The estimates of maternal dominance and

direct epistatic loss effects were reduced from 2.28% and –2.19% in LS to 1.72% and

–1.04% in the R1, and to 1.55% and –0.66% in the R2, respectively. Sampling

correlations between estimates, however, showed that maternal dominance and direct

epistatic loss effects were still highly confounded under ridge regression methods. These

results suggest that the variety of crosses available in the dataset, aggravated by linear

dependences between HM and ED, did not comprise enough information to effectively

separate maternal dominance and direct epistatic loss effects, regardless the fact that both

effects were statistically significant.

Estimates of direct and maternal dominance obtained in this study were lower than

the range of heterosis from 3 to 8% (mean = 4%) reported by Long (1980) on pre-

weaning gain. The causes for these low estimates of dominance effects are not clear.

Partial confounding of contemporary group effects with breed composition and breed

101

heterozygosity effects could be a reasonable explanation of the low estimates of

dominance effects. However, a preliminary analysis to check for connectedness among

contemporary groups across breeds was performed and only connected contemporary

groups with at least two classes of direct or maternal heterozygosities were retained for

the analysis. Additional analyses where dominance effects were estimated for pair of

breeds with a large number of records to accommodate possible specific combining

ability between breed pairs, excluding epistatic loss effects in the model, likewise resulted

in low estimates (not reported), in agreement with results obtained by Miller (1996).

Additive-dominance models, which simultaneously estimate additive and heterosis

effects or estimate additive effects after adjusting for heterosis on the basis of expected

breed heterozygosity, are standard models for genetic evaluation in the beef industry.

Comparisons of ABC from additive-dominance models with ABC from additive-

dominance-epistatic models, using either LS or ridge regression methods for estimating

fixed genetic effects, allowed two important practical observations. The first observation

was that the re-ranking of sires, dams, and calves was essentially due to differences in

estimates of breed additive effects rather than differences in non-additive effects. The

second observation was that the inclusion of epistatic loss effects in the additive-

dominance model did not alter the rank of the animals on the basis of ABC. Estimates of

dominance and epistatic loss effects were of low magnitude and showed high degree of

confounding even when ridge regression methods were used.

102

CONCLUSIONS

Linear dependencies between predictor variables of direct and maternal breed

additive effects and between predictor variables of maternal dominance and direct

epistatic loss effects were the main causes of multicollinearity. The use of ridge

regression methods outperformed LS with regard to average VIF and MSEP. Estimates

obtained by ridge regression were more stable and could be used with advantage over LS

for prediction purposes. The ridge regression methods were particularly effective in

reducing the degree of multicollinearity involving predictor variables of breed additive

effects. The use of estimates obtained by ridge regression methods instead of estimates

obtained by LS for calculating ABC could increase the probability of properly ranking

animals for across breed comparisons.

The variety of crosses in the dataset provided little opportunity to separate

dominance and epistatic loss effects and provide accurate estimates. Due to high degree

of confounding between estimates of maternal dominance and direct epistatic loss effects,

it was not possible to compare the relative importance of these effects with a high level of

confidence. The inclusion of epistatic loss effects in the standard additive-dominance

model used in genetic evaluation did not cause appreciable re-ranking of animals on the

basis of ABC.

103

Table 4.1. Correlation coefficients among predictor variables of direct (D) and maternal

(M) fixed genetic effects (n = 478,466)

HD a ED AND BDD CHD GVD HED LMD MAD SAD SHD SMD UND

HD 1.00

ED 0.45 1.00

AND –0.09 –0.08 1.00

BDD 0.00 0.00 –0.05 1.00

CHD 0.06 0.05 –0.18 –0.08 1.00

GVD 0.01 0.02 –0.01 0.00 –0.04 1.00

HED –0.14 –0.20 –0.23 –0.09 –0.29 –0.05 1.00

LMD 0.10 0.10 –0.13 –0.07 –0.22 –0.03 –0.24 1.00

MAD 0.02 0.00 –0.03 –0.02 –0.05 –0.01 –0.07 –0.05 1.00

SAD –0.05 –0.02 –0.04 –0.02 –0.08 –0.01 –0.10 –0.07 –0.01 1.00

SHD 0.03 0.00 –0.06 –0.03 –0.08 –0.01 –0.11 –0.05 0.01 –0.01 1.00

SMD 0.00 0.05 –0.16 –0.07 –0.22 –0.04 –0.26 –0.23 –0.05 –0.07 –0.08 1.00

UND 0.26 0.32 –0.09 0.01 –0.02 0.00 –0.19 0.05 0.01 –0.01 –0.04 –0.03 1.00

HM 0.43 0.95 –0.07 0.00 0.07 0.02 –0.20 0.09 0.00 –0.02 –0.01 0.06 0.29

EM 0.09 0.17 –0.02 0.01 0.01 0.04 –0.15 0.06 0.03 0.02 –0.03 0.10 0.06

ANM –0.08 –0.06 0.90 –0.04 –0.16 0.00 –0.22 –0.08 –0.02 –0.03 –0.07 –0.15 –0.10

BDM –0.06 –0.03 –0.04 0.85 –0.06 0.01 –0.09 –0.05 –0.01 –0.02 –0.02 –0.06 –0.01

CHM –0.05 0.02 –0.14 –0.04 0.81 –0.02 –0.30 –0.12 –0.04 –0.06 –0.09 –0.17 –0.04

GVM –0.02 0.00 –0.01 0.00 –0.03 0.82 –0.04 –0.02 –0.01 –0.01 –0.01 –0.03 –0.01

HEM 0.07 –0.15 –0.24 –0.09 –0.24 –0.05 0.89 –0.18 –0.07 –0.10 –0.13 –0.22 –0.21

LMM –0.11 0.03 –0.09 –0.03 –0.16 –0.01 –0.23 0.80 –0.03 –0.05 –0.05 –0.17 0.00

MAM 0.01 0.00 –0.02 –0.01 –0.04 0.00 –0.07 –0.03 0.83 –0.01 0.00 –0.04 –0.02

SAM –0.08 –0.04 –0.04 –0.02 –0.07 0.00 –0.09 –0.06 –0.01 0.92 –0.02 –0.07 –0.02

SHM 0.09 0.03 –0.06 –0.03 –0.07 –0.01 –0.11 –0.03 0.01 0.00 0.92 –0.07 –0.05

SMM –0.02 0.05 –0.13 –0.05 –0.15 –0.02 –0.27 –0.16 –0.05 –0.06 –0.08 0.85 –0.06

UNM 0.24 0.29 –0.08 0.01 0.00 0.01 –0.19 0.04 0.01 0.00 –0.05 –0.03 0.92

PWG –0.05 –0.01 0.02 –0.12 0.03 0.06 –0.19 0.07 0.02 0.21 –0.01 0.05 –0.02

104

Table 4.1. Continuation …

HM EM ANM BDM CHM GVM HEM LMM MAM SAM SHM SMM UNM

HM 1.00

EM 0.19 1.00

ANM –0.06 –0.05 1.00

BDM –0.03 0.01 –0.04 1.00

CHM 0.02 0.03 –0.16 –0.05 1.00

GVM 0.00 0.03 –0.01 0.00 –0.03 1.00

HEM –0.16 –0.16 –0.25 –0.10 –0.33 –0.04 1.00

LMM 0.03 0.08 –0.10 –0.04 –0.15 –0.02 –0.25 1.00

MAM 0.00 0.03 –0.03 –0.01 –0.04 –0.01 –0.07 –0.03 1.00

SAM –0.03 0.01 –0.04 –0.02 –0.06 –0.01 –0.10 –0.04 –0.01 1.00

SHM 0.02 –0.02 –0.07 –0.03 –0.09 –0.01 –0.14 –0.06 –0.01 –0.02 1.00

SMM 0.06 0.11 –0.16 –0.06 –0.20 –0.03 –0.29 –0.16 –0.05 –0.06 –0.09 1.00

UNM 0.31 0.08 –0.10 –0.01 –0.04 –0.01 –0.23 –0.01 –0.02 –0.02 –0.06 –0.05 1.00

PWG 0.01 0.06 0.01 0.02 0.05 0.04 –0.19 –0.07 0.03 0.07 –0.01 0.22 –0.01

a H = dominance, E = epistatic loss, AN = Angus, BD = Blond D’Aquitane,


SA = Salers, SH = Shorthorn, SM = Simmental, UN = Unknown, and

PWG = pre-weaning gain.

105

Table 4.2. Eigenvalues of the correlation matrix among predictor variables of fixed

genetic effects and corresponding condition indices

Eigenvalue Condition index Eigenvalue Condition index

2.85885 1.00000 0.24485 3.41702

2.32962 1.10778 0.19129 3.86586

2.22183 1.13433 0.18175 3.96606

2.16618 1.14881 0.16806 4.12443

2.01692 1.19056 0.15531 4.29032

1.95458 1.20940 0.11851 4.91155

1.90077 1.22640 0.10036 5.33732

1.85255 1.24226 0.08539 5.78605

1.83773 1.24725 0.07706 6.09099

1.80300 1.25921 0.05844 6.99450

0.91022 1.77224 0.05078 7.50351

0.71406 2.00092 0.00189 38.85385

106

Table 4.3. Decomposition of the variance structure of the parameter estimates associated

with the two largest condition indices

Condition index = 38.85 Condition index = 7.50 Predictor variable a

Direct Maternal Direct Maternal

H 0.00 0.12 0.01 0.85

E 0.12 0.00 0.83 0.0

AN 0.91 0.74 0.00 0.00

BD 0.74 0.40 0.00 0.00

CH 0.96 0.83 0.00 0.00

GV 0.43 0.13 0.00 0.00

HE 0.95 0.85 0.00 0.00

LM 0.95 0.76 0.00 0.00

MA 0.60 0.30 0.00 0.00

SA 0.66 0.31 0.00 0.00

SH 0.66 0.48 0.00 0.00

SM 0.96 0.82 0.00 0.00 a H = dominance, E = epistatic loss, AN = Angus, BD = Blond D’Aquitane,



107

Table 4.4. Values of the ridge parameter (K) obtained by ridge regression methods R1

and R2, for direct and maternal genetic effects

Direct Maternal Predictor

variable a R1 R2 R1 R2

H 0.000036 0.000754 0.004070 0.004438

E 0.000078 0.004377 0.000092 0.000413

AN 0.003156 0.020457 0.000515 0.008442

BD 0.010801 0.005596 0.027317 0.002506

CH 0.010583 0.030383 0.001836 0.010470

GV 0.005848 0.002052 0.000789 0.001359

HE 0.003317 0.040000 0.034061 0.017590

LM 0.000861 0.024490 0.000328 0.006996

MA 0.000522 0.003141 0.000308 0.001843

SA 0.000442 0.006771 0.011455 0.003565

SH 0.000094 0.006984 0.000413 0.004834




108

Table 4.5. Summary of results obtained over one hundred bootstrap samples for ordinary

least squares (LS) and ridge regression methods R1 and R2

Ridge regression Statistic LS

R1 R2

MSEP a ± SD 245.67 ± 2.13 238.40 ± 0.87 238.44 ± 0.87

Average VIF b 26.81 6.10 4.18

Maximum VIF 104.50 10.92 8.70

VIF > 10 c 16 2 0

Total variance d 153,371.55 34,918.49 23,909.80

bb'ˆ 1,324.27 933.18 466.91

a Mean squared error of prediction.

b Average variance inflation factor of 24 predictor variables.

c Number of predictor variables out of 24 with VIF higher than 10.

d Total variances of LS and ridge regression methods were obtained by trace[(X�X)–1]�2

and trace[(X�X + K)–1 X�X(X�X + K)–1]�2, respectively.

109

Table 4.6. Estimates of direct and maternal dominance (H) and epistatic loss (E) effects

on pre-weaning gain (kg), obtained by ordinary least squares (LS) and ridge regression

methods R1 and R2

Ridge regression LS

R1 R2

Direct Maternal Direct Maternal Direct Maternal

H

2.67 ± 0.07

(1.31%) a

4.64 ± 0.17

(2.28%)

2.67 ± 0.07

(1.31%)

3.50 ± 0.15

(1.72%)

2.51 ± 0.07

(1.23%)

3.16 ± 0.15

(1.55%)

E

–4.45 ± 0.31

(–2.19%)

–0.16 ± 0.15

(–0.08%)

–2.11 ± 0.29

(–1.04%)

–0.11 ± 0.15

(–0.05%)

–1.34 ± 0.27

(–0.66%)

–0.07 ± 0.15

(–0.03%)

a Values between parentheses were expressed relative to the overall phenotypic average.

110

Table 4.7. Estimates of direct and maternal breed additive effects on pre-weaning gain

(kg), as deviations from Angus, obtained by ordinary least squares (LS) and ridge

regression methods R1 and R2

Ridge regression LS

R1 R2

Breed a

Direct Maternal Direct Maternal Direct Maternal

BD 5.34 ± 0.72 –5.68 ± 0.53 3.50 ± 0.36 –4.00 ± 0.36 2.07 ± 0.36 –3.92 ± 0.41

CH 13.19 ± 0.62 –2.88 ± 0.36 9.57 ± 0.21 –0.96 ± 0.16 5.28 ± 0.11 1.09 ± 0.14

GV 10.41 ± 0.92 7.91 ± 0.89 8.62 ± 0.71 8.73 ± 0.82 7.65 ± 0.69 8.92 ± 0.82

HE –6.26 ± 0.63 –3.21 ± 0.36 –6.04 ± 0.20 –3.07 ± 0.11 –1.36 ± 0.10 –5.15 ± 0.12

LM –3.07 ± 0.63 0.61 ± 0.38 –4.24 ± 0.24 1.27 ± 0.20 –1.48 ± 0.13 –0.29 ± 0.18

MA 12.29 ± 0.79 0.25 ± 0.59 10.88 ± 0.54 1.02 ± 0.50 9.17 ± 0.50 1.84 ± 0.49

SA 0.60 ± 0.76 7.55 ± 0.60 3.05 ± 0.47 4.63 ± 0.47 0.12 ± 0.43 7.06 ± 0.47

SH –9.30 ± 0.75 4.57 ± 0.47 –9.83 ± 0.49 4.86 ± 0.35 –4.47 ± 0.42 2.13 ± 0.32

SM 14.19 ± 0.63 5.33 ± 0.37 12.33 ± 0.24 6.34 ± 0.18 6.19 ± 0.12 8.76 ± 0.15

a BD = Blond D’Aquitane, CH = Charolais, GV = Gelbvieh, HE = Hereford,

LM = Limousin, MA = Maine-Anjou, SA = Salers, SH = Shorthorn, and

SM = Simmental.

111

Table 4.8. Number of calves including records from 1986 to the indicated year, expressed

as equivalent purebred calves

1996 1997 1998 1999 Breed a

Number b % c Number % Number % Number

AN 40,203 88.65 42,192 93.03 43,864 96.72 45,352

BD 7,119 75.15 8,075 85.24 8,842 93.34 9,473

CH 84,505 88.78 88,409 92.88 92,022 96.67 95,190

GV 1,278 58.87 1,598 73.61 1,865 85.91 2,171

HE 125,674 95.74 127,780 97.35 129,659 98.78 131,265

LM 64,290 93.11 66,278 95.99 67,841 98.25 69,047

MA 4,436 94.06 4,588 97.29 4,700 99.66 4,716

SA 7,944 86.98 8,431 92.31 8,897 97.42 9,133

SH 11,913 93.14 12,312 96.26 12,560 98.19 12,791

SM 72,647 89.81 76,022 93.98 78,575 97.14 80,890 a AN = Angus, BD = Blond D’Aquitane, CH = Charolais, GV = Gelbvieh,

HE = Hereford, LM = Limousin, MA = Maine-Anjou, SA = Salers, SH = Shorthorn,

and SM = Simmental.

b To obtain the number of equivalent purebred calves, breed portions in the breed

compositions were added over all calves in the dataset.

c Number of calves in each year expressed as percentages of the total number of calves.

112

Table 4.9. Values of the ridge parameter (K), obtained by ridge regression methods R1

and R2, using records from 1986 to 1996

Direct Maternal Predictor

variable a R1 R2 R1 R2

H 0.000044 0.000877 0.002221 0.005142

E 0.000072 0.005068 0.000110 0.000479

AN 0.001998 0.023342 0.000983 0.009726

BD 0.008346 0.006061 0.008435 0.002711

CH 0.015607 0.034452 0.003557 0.011923

GV 0.002693 0.002255 0.004304 0.001556

HE 0.007567 0.046000 0.000489 0.020278

LM 0.000864 0.028105 0.000220 0.008025

MA 0.000656 0.003669 0.008829 0.002151

SA 0.001061 0.007809 0.008047 0.004145

SH 0.000168 0.008066 0.000179 0.005608




113

0

20

40

60

80

100

120

H E AN BD CH GV HE LM MA SA SH SM

Predictor variable

VIF

DirectMaternal


direct and maternal dominance (H), epistatic loss (E), and breed additive effects

114

Direct Effects

-12

-9

-6

-3

0

3

6

9

12

15

0 15 30 45 60 75 90 105 120 135

Iteration

Est

imat

e (k

g)

HEBDCHGVHELMMASASHSM

Maternal Effects

-6

-3

0

3

6

9

12

0 15 30 45 60 75 90 105 120 135

Iteration

Est

imat

e (k

g)


Figure 4.2. Convergence of the estimates of direct and maternal dominance (H), epistatic

loss (E), and breed additive effects under ridge regression method R1

115

Direct Effects

-9

-6

-3

0

3

6

9

12

0 10 20 30 40 50 60

Iteration

Est

imat

e (k

g)


Maternal Effects

-6

-3

0

3

6

9

12

0 10 20 30 40 50 60

Iteration

Est

imat

e (k

g)

H

E

BD

CH

GV

HELM

MA

SA

SH

SM

Figure 4.3. Convergence of the estimates of direct and maternal dominance (D), epistatic

loss (E), and breed additive effects under ridge regression method R2

116

Direct Effects

0

20

40

60

80

100

120


Predictor variable

VIF

LSR1R2

Maternal Effects

0

10

20

30

40

50


Predictor variable

VIF LS

R1R2

Figure 4.4. Variance inflation factor (VIF) associated with predictor variables of direct

and maternal dominance (H), epistatic loss (E), and breed additive effects under ordinary

least squares (LS) and ridge regressions methods R1 and R2

117

Direct Effects

-15

-10

-5

0

5

10

15

Predictor variable

Est

imat

e (k

g)

LS

R1

R2

H E BD CH GV HE LM MA SA SH SM

Direct Effects

0

0.2

0.4

0.6

0.8

1


Predictor variable

Stan

dard

err

or (k

g)

LSR1R2

Figure 4.5. Estimates (as deviations from AN) and standard errors of direct dominance

(H), epistatic loss (E), and breed additive effects under ordinary least squares (LS) and

ridge regression methods R1 and R2

118

Maternal Effects

-9

-6

-3

0

3

6

9

Predictor variable

Est

imat

e (k

g)LS

R1

R2


Maternal Effects

0

0.2

0.4

0.6

0.8

1


Predictor variable

Stan

dard

err

or (k

g)

LSR1R2

Figure 4.6. Estimates (as deviations from AN) and standard errors of maternal dominance

(H), epistatic loss (E), and breed additive effects under ordinary least squares (LS) and

ridge regression methods R1 and R2

119

0.5

0.6

0.7

0.8

0.9

1

AN BD CH GV HE LM MA SA SH SM

Predictor variable

Sam

plin

g co

rrel

atio

n x

-1.0

LS

R1

R2

HM x ED

Figure 4.7. Sampling correlations (multiplied by –1.0) between estimates of maternal

dominance (HM) and direct epistatic loss (ED) effects and between estimates of direct and

maternal breed additive effects given by ordinary least squares (LS) and ridge regression

methods R1 and R2

120

Sires

0.6

0.7

0.8

0.9

1

AD-AH AD-LS AD-R2 ADE-LS ADE-R1

Model

Cor

rela

tion

Pearson

Spearman

40%

20%

10%

1%

Dams

0.6

0.7

0.8

0.9

1


Model

Cor

rela

tion

Pearson

Spearman

40%

20%

10%

1%

Calves

0.6

0.7

0.8

0.9

1


Model

Cor

rela

tion

Pearson

Spearman

40%

20%

10%

1%

Figure 4.8. Pearson and Spearman correlations, and percentages of coincidence for

different proportions of selected (top 1%, 10%, 20%, and 40%) sires, dams, and calves on

the basis of ABC yielded by different models compared to model ADE-R2

121

Direct Effects

-15

-10

-5

0

5

10

15

20

Predictor variable

Est

imat

e (k

g)1996

1997

1998

1999


Maternal Effects

-9

-6

-3

0

3

6

9

12

Predictor variable

Est

imat

e (k

g)

1996

1997

1998

1999


Figure 4.9. Estimates of direct and maternal dominance (H), epistatic loss (E), and breed

additive effects (as deviations from AN), under ordinary least squares, using records from

1986 to the indicated year

122

Direct Effects

-10

-5

0

5

10

15

Predictor variable

Est

imat

e (k

g)1996

1997

1998

1999


Maternal Effects

-6

-3

0

3

6

9

12

Predictor variable

Est

imat

e (k

g)

1996

1997

1998

1999



breed additive effects (as deviations from AN), under ridge regression method R1, using

records from 1986 to the indicated year (ridge parameter K was obtained using records

from 1986 to 1996)

123

Direct Effects

-6

-4

-2

0

2

4

6

8

10

Predictor variable

Est

imat

e (k

g)1996

1997

1998

1999


Maternal Effects

-6

-3

0

3

6

9

12

Predictor variable

Est

imat

e (k

g)

1996

1997

1998

1999


Figure 4.11. Estimates of direct and maternal dominance (H), epistatic loss (E), and breed

additive effects (as deviations from AN), under ridge regression method R2, using records

from 1986 to the indicated year (ridge parameter K was obtained using records from 1986

to 1996)

124

Chapter 5

General Discussion

In the previous chapters different problems relating to statistical methods applied to

estimation of breeding values for animals in a multi-breed population of beef cattle were

investigated. In Chapter 2, methods for measuring the degree of connectedness among

test groups of centrally tested beef bulls were assessed and compared. Models to predict

PEVD, which could be routinely used in genetic evaluation programs, were defined.

Chapter 3 was concerned with estimation of variance components, multi-breed additive

genetic changes, and direct and maternal breed additive, dominance, and epistatic loss

effects on pre-weaning gain. Chapter 4 dealt with estimation of genetic effects in the

presence of multicollinearity. Emphasis on acquiring stable estimates of direct and

maternal breed additive, dominance, and epistatic loss effects was made, which could

contribute to more accurate and consistent multi-breed genetic evaluation of beef cattle.

125

Degree of connectedness among test groups of

centrally tested beef bulls

The degree of connectedness among test groups is likely a limiting factor for effective

selection across test groups. With a lower degree of connectedness between test groups,

comparison of animals’ EBV from different groups is less accurate and can result in

incorrect ranking of animals across test groups. Kennedy and Trus (1993) suggested that

PEVD should be the basis for measuring connectedness. This statistic, however, is

computationally excessive and very difficult to apply in routine large-scale genetic

evaluation. Various criteria have been proposed for measuring connectedness, but most

are not feasible for implementation in very-large scale genetic evaluation. In Chapter 2,

three alternative methods (VED, CR, and GLT) were studied and used in models to

predict PEVD using weights of bulls tested in central evaluation stations in Ontario,

Canada, from 1988 to 2000. The degree of connectedness was calculated for pairs of test

groups and for each test group with all other test groups. Connectedness between pairs of

test groups indicates the level of accuracy in comparing EBV of animals from two test

groups, whereas average connectedness of each test group with all others indicates the

level of average accuracy in comparing EBV of an animal with animals in all other test

groups, allowing effective selection across all test groups.

Results presented in Chapter 2 indicate that the average PEVD of pairs of test groups

can be more accurately predicted on the basis of the model that includes GLT than on the

basis of models that include either VED or CR. Average PEVD of each test group with all

126

other test groups can be more accurately predicted on the basis of models that include

either CR or GLT. Because GLT is computationally less demanding than CR, it could be

easily and routinely calculated.

The GLT method used for measuring the degree of connectedness between test

groups, in its original form, considers only the direct genetic links due to common sires

and dams. The animal model used in the estimation of PEVD accounts for a larger

number of relationships than those due to common sires and dams. The fact that GLT and

PEVD were highly associated and that 94.5% of the total number of genetic links

between test groups was due to the use of common sires indicates that, in terms of

connectedness, common sires accounted for the most important relationships. Hanocq and

Boichard (1999) reported similar observations for the French national evaluation of

Holsteins. In their study, many of the additional relationships beyond those due to

common sires were within herds and, therefore, did not contribute to increase the

accuracy of comparisons among herds. To consider other relationships besides those due

to common sires and dams in the calculation of GLT, the extra computational cost versus

the increase in the accuracy of PEVD must be evaluated.

All measures of connectedness studied showed a decrease in the degree of

connectedness among test groups after 1994. Thus, the current trend in the accuracy of

comparisons of bulls tested in different test groups in Ontario is not favourable, even

though a requirement of a minimum of 12 bulls and 4 sires per test group has been

observed when determining the test groups.

For modifying the current trend with regard to connectedness and increase the

accuracy of comparisons, recommendations must be developed. These recommendations

should include the following ideas. (1) Artificial insemination plays an important rule

127

because it allows the distribution of progeny of sires across herds and, by extension,

across test groups. The planned use of common sires with high genetic values across

herds can increase connectedness among test groups, along with promoting genetic

improvement of the whole population. (2) The GLT, which was highly associated with

PEVD and has a practical advantage of being not excessively computational, could be

rapidly determined through simulation when test groups are proposed and decisions could

be made to increase the number of genetic links among test groups, allowing accurate

comparison of EBV across test groups. When forming the test groups, the genetic

relationships among bulls within and across test groups have opposite effects, as shown

by Kennedy and Trus (1993). The degree of connectedness increases with relationships

across test groups, whereas it decreases when the within-group relationship increases.

Practical implications

The common beef cattle practices of selection based on estimated breeding values

using Best Linear Unbiased Prediction allows comparison of animals tested in different

environments (e.g., test groups), provided that environmental units are genetically

connected. A high degree of connectedness is associated with a high accuracy of

comparison of animals’ EBV tested in different environmental units, enabling increased

rates of genetic gain. There was no well-established procedure for measuring the degree

of connectedness among groups of station tested beef bulls. In this study, different

methods for measuring the degree of connectedness were compared. The GLT method,

which is based on the total number of direct genetic links due to common sires and dams

between test groups, was suggested for measuring the degree of connectedness among

groups of centrally tested beef bulls with the aim of improving the accuracy of

128

comparison of bulls’ EBV across test groups. This method is not computationally

demanding, enables differentiation between completely disconnected test groups from

connected ones, and is highly correlated with PEV of comparison between groups of

bulls.

Limitations and suggestions for further investigations

In this investigation, the methods were evaluated under a univariate animal model.

Given that multiple trait models are widely used in the beef industry, investigations of

connectedness using multiple trait models, particularly when animals are not observed for

all traits and/or traits have different models, would merit further examination. Studies

including a simulation to evaluate the degree of dependence of the alternative methods on

the particular structure of the data are warranted.

129

Additive, dominance, and epistatic loss on pre-

weaning gain in crossing of different Bos taurus

breeds

Genetic evaluation involving purebred and crossbred animals from a large number of

breeds have been ongoing in Ontario for many years (Miller et al., 1995). One of the main

reasons for a multi-breed genetic evaluation is the possibility of comparing animals of

various breeds and breed constitutions in the pooled dataset, enabling effective use of the

genetic variability that exists in the whole population. Estimates of variance components,

heterosis, breed effects, and breed additive genetic changes have been previously

obtained in Ontario (Miller 1996; Sullivan et al., 1999), but there were no available

estimates of separate direct and maternal dominance and epistatic loss effects associated

with breed heterozygosities. The main objective of Chapter 3 was to obtain estimates of

variance components, breed additive genetic changes, and breed additive, dominance, and

epistatic loss effects on pre-weaning gain in Ontario.

The database available to develop this study included data from approximately 60

different breeds, as well as crossbreds. Many of these breeds, however, were represented

by a small number of animals. Estimating all effects included in the genetic model for

those breeds with few records could be inaccurate. For this reason, a subset of the 10 most

popular breeds, including Angus, Blond D’Aquitane, Charolais, Gelbvieh, Hereford,

Limousin, Maine-Anjou, Salers, Shorthorn, and Simmental, was chosen for the analysis.

130

The GLT method, which was described in Chapter 2, was used to identify a subset of

genetically connected contemporary groups across breeds to be used in the analysis. This

procedure was intended to minimize possible confounding between contemporary group

and genetic effects.

Estimates of variance components obtained in Chapter 3 did not greatly differ from

previous studies in Ontario. Expressed as proportions of the phenotypic variance, direct

additive genetic, maternal additive genetic, maternal permanent environment, and residual

variances were equal to 0.32, 0.20, 0.12, and 0.52, respectively. A strong genetic

correlation of –0.63 between direct and maternal effects was found. This correlation

seemed to be more likely a consequence of lack of enough information in the dataset to

separate partially confounded effects than an indication of a true antagonistic relationship.

Annual breed additive genetic changes obtained for Angus, Charolais, Hereford,

Limousin, and Simmental, using two different approaches, indicated positive annual

additive genetic changes. The traditional approach based on yearly average breeding

values of purebred calves and the alternative approach based on regression of yearly

estimated breeding values on contribution of each breed to the breed composition of the

calves revealed differences in selection practices among breeds. Producers used animals

of substantially higher additive genetic values to produce purebred Charolais, Hereford,

and Simmental than to produce crossbred animals. Producers of Angus and Limousin

used animals of similar genetic values to produce both purebred and crossbred animals.

Direct and maternal dominance effects caused a favourable effect equivalent to

1.31% and 2.28% of the phenotypic average. Direct and maternal epistatic loss effects

were equivalent to –2.19% and –0.08%, respectively, but the maternal epistatic loss effect

was statistically not different from zero (P > 0.05). To detect significant effects, a larger

131

variety of crossbred sires in the dataset might be required. Standard errors of maternal

dominance and direct epistatic loss effects were large, in comparison to standard errors of

direct dominance and maternal epistatic loss effects. Additional analysis excluding

epistatic loss effects resulted in estimates of direct and maternal dominance of 1.31% and

1.84%, respectively. Therefore, estimates of dominance from both models did not differ

greatly. The estimates of direct and maternal dominance effects obtained in this

investigation were substantially lower than the heterosis of 5% assumed in the genetic

evaluation procedures for pre-weaning gain currently used in Ontario.

Estimates of breed additive effects were in general agreement with expectations

based on previous studies in Ontario. Standard errors of the estimates of breed effects and

sampling correlations between estimates of direct and maternal breed effects were high.


Estimates of variance components obtained in this study were in line with previous

studies in Ontario. Because the strong negative genetic correlation between direct and

maternal genetic effects was more likely due to lack of enough information in the dataset

to accurately estimate these two partially confounded effects, it seems reasonable to

assume a zero genetic correlation between direct and maternal effects in the genetic

evaluation until an alternative parameter is verified with further research.

In the multi-breed genetic evaluation currently run in Ontario, pre-weaning gain

records are pre-adjusted for direct and maternal heterosis assuming a level of 5% for both,

based on average values from literature. Given the accumulated evidence that the level of

direct and maternal heterosis on pre-weaning gain is lower than 5%, this level should be

reviewed in the multi-breed genetic evaluation in this population.

132

Large standard errors of the estimates of breed effects and high sampling correlations

between estimates of direct and maternal breed effects can be a symptom of a lack of

sufficient information to estimate both direct and maternal breed effects and/or

multicollinearity among predictor variables of breed effects. Because estimates of breed

effects comprise part of the across-breed estimated breeding values (across-breed

comparisons or ABC) used as selection criterion across breeds, problems with accurate

estimation of breed effects may result in unreliable ranking of the animals.


Direct and maternal epistatic loss effects were estimated using average

heterozygosities of the parents of an individual (ED) and its mother (EM) as predictor

variables (Fries et al., 2000). Coefficients ED and EM are easily determined because they

are simple functions of the heterozygosities of the parents. In addition, they allow for

differentiating the amount of epistatic loss in the F2 from the amount in the F3 and further

advanced generations. ED and EM have a relative probabilistic interpretation, but they are

not directly biologically interpretable. The average epistatic loss due to the breakdown of

all kinds of gene interactions, as a deviation from the average additive and dominance

effects, are estimated by ED and EM (Fries et al., 2002). A drawback of this approach is

that recombination between uniting gametes is partially confounded with dominance

effects, as occurs in the definition of recombination loss (Dickerson, 1973). The

drawback comes from the fact that interactions between genes (in the same locus or in

different loci) are taken as a dominance effect, while interactions between genes

occurring one generation back, at the gamete level, are taken as recombination or epistatic

133

loss. Further investigations to compare ED and EM with different predictor variables for

epistatic loss are warranted.

The breeds included in this study were from distinct biological types. An adequate

decomposition of dominance and epistatic loss effects should consider possible specific

combining ability between pairs of breeds or biological types. However, due to data

structure limitations involving some breeds, the same dominance and epistatic loss effects

were assumed for crosses of different pairs of breeds.

The database available to develop this study includes data from approximately 60

different pure breeds, as well as crossbreds, but only breeds represented by a large

number of animals were considered in the analysis. A question that arises is how to

evaluate animals from breeds represented by a small number of animals in the dataset. An

investigation to evaluate possible alternatives is warranted.

Further work is needed to investigate the nature of the strong antagonistic genetic

correlation between direct and maternal genetic effects. A simulation study could be used

to determine the data structure required to generate accurate genetic correlations between

direct and maternal effects.

Estimates of breed additive effects are included in the across-breed estimated

breeding value used as selection criterion in multi-breed beef cattle. Because estimates of

breed effects were highly unstable, showing high standard errors, an investigation to

detect the causes of instability and application of alternative statistical methods was

conducted in Chapter 4.

134

Estimation of genetic effects in the presence of

multicollinearity

In Chapter 4 a framework for obtaining stable estimates of breed additive,

dominance, and epistatic loss effects on pre-weaning gain when multicollinearity among

predictor variables is of concern was developed. The framework was constructed by

firstly identifying the predictor variables involved in multicollinearity, and, secondly,

applying ridge regression methods in the estimation of direct and maternal breed additive,

dominance, and epistatic loss effects. The genetic model used in the analysis accounted

for all known genetic relationships among animals through the additive relationship

matrix, which was possible because all animals had complete pedigree information. The

application of such a complete model is generally not possible for field datasets,

particularly when multiple sire mating groups are used.

Multicollinearity diagnostics performed in Chapter 4 indicated that predictor

variables of direct and maternal breed additive effects were the main candidates for linear

dependencies, followed by predictor variables of maternal dominance and direct epistatic

loss effects. Mathematical constraints among predictor variables and the small proportion

of crossbred sires in the dataset seemed to be the main causes of multicollinearity.

The choice of an adequate ridge parameter is one of the most important tasks in a

ridge regression analysis. Two ridge regression methods were used to determine the ridge

parameter: R1 was the generalized ridge estimator of Hoerl and Kennard (1970a) and R2

was based on bootstrap and cross-validation (Delaney and Chatterjee, 1986), extended to

135

obtain diagonal elements of the ridge parameter matrix K proportional to the variance

inflation factor of each regression coefficient under LS. With this procedure, unnecessary

bias is not imposed on those predictor variables not seriously involved in

multicollinearity.

Ridge parameters determined by both ridge regression methods resulted in a set of

estimates of regression coefficients with smaller MSEP and lower VIF than LS. Average

MSEP under both ridge regression methods, obtained over one hundred bootstrap

samples, were 3% lower than in the LS. Average VIF given by ridge regression methods

R1 and R2 were 77% and 84% lower than in the LS, respectively. A model that results in

lower VIF and smaller MSEP is desirable because these statistics indicate stability in the

estimates (lower standard errors) and the ability of the model to predict future

observations.

Due to multicollinearity among predictor variables of both direct and maternal breed

additive effects, most regression coefficients of breed effects given by LS showed large

standard errors and were highly confounded. The use of ridge regression methods tended

to alleviate these symptoms of multicollinearity among breed effects. Estimates of breed

effects given by ridge regression were more stable (had lower standard errors) and

showed a lower degree of confounding than estimates given by LS. These desirable

properties of estimates obtained by ridge regression methods will increase the probability

of properly ranking the breeds, which will ultimately result in a suitable ranking of the

animals in the across breed comparisons.

Estimates of maternal dominance and direct epistatic loss effects had large standard

errors and were highly confounded under LS. The use of ridge regression methods only

slightly reduced the standard errors and the degree of confounding between these

136

estimates, suggesting that the small proportion of crossbred sires in the dataset,

aggravated by linear dependencies between corresponding predictor variables, did not

comprise enough information to effectively separate maternal dominance and direct

epistatic loss effects, regardless of the fact that both effects were statistically significant.

As a consequence, the inclusion of epistatic loss effects in the genetic model did not cause

noticeable re-ranking of the animals in the across breed comparison.


A problem that has received little attention is the assessment of causes,

consequences, and development of methods to minimize the negative consequences of

multicollinearity in genetic evaluation of multi-breed populations. In this study, a

framework using ridge regression methods was developed to deal with multicollinearity

problems in the genetic evaluation of multi-breed populations.

With an adequate choice of the ridge parameter, ridge regression methods resulted in

lower VIF and smaller MSEP than LS. Estimates obtained by ridge regression were more

stable (lower standard errors) and could be used with advantage over LS for prediction

purposes. Ridge regression methods were particularly effective to alleviate the symptoms

of multicollinearity caused by linear dependencies among predictor variables of breed

additive effects. Besides lower standard errors, estimates of breed effects under ridge

regression methods showed lower degree of confounding in comparison to LS. Thus, the

use of the estimates obtained by ridge regression methods can increase the probability of

properly ranking the animals in the across breed comparisons. Meanwhile, more

consistent across-breed estimated breeding values in successive genetic evaluations can

be expected.

137

The few variety of crosses due to a small proportion of crossbred sires in the dataset,

in addition to the linear dependencies among predictor variables, provided little

opportunity for obtaining accurate estimates of separated effects of dominance and

epistatic loss effects. Due to the high degree of confounding between estimates of

maternal dominance and epistatic loss effects, it was not possible to compare the relative

importance of these components of the heterosis with a high level of confidence. The

inclusion of epistatic loss effects in the standard additive-dominance model used in the

multi-breed genetic evaluation does not cause appreciable re-ranking of animals on the

basis of across-breed estimated breeding values.


The variance of a particular breed additive effect estimate depends on the number of

animals from the particular breed in the analysis and the degree of multicollinearity of the

corresponding predictor variable as well. The framework developed herein, based on

ridge regression methods, offers an alternative way to account for multicollinearity. A

limitation with ridge regression is that it ignores the number of animals in each breed

when shrinking the estimates to break the dependencies. In this investigation, breeds with

larger number of records were more associated with multicollinearity. As a consequence,

estimates for breeds with a larger number of animals were shrunk to a higher degree than

breeds with a smaller number of records, even though the former group of breeds showed

lower standard errors under LS. The reduction in breed difference estimates resulting

from ridge regression methods, although more stable, should be further evaluated.

Investigation of other evaluated traits and comparison to other alternative methods to deal

with multicollinearity, such as that of treating genetic groups as random, is warranted.

138

References

AAFC. 1993. National standards document for the genetic improvement of Canadian

beef cattle. Livestock Development Division, Agriculture Development Branch,

Agriculture Canada, Ottawa, Ontario.

Arthur, P. F., Hearshaw H. and Stephenson, P. D. 1999. Direct and maternal additive

and heterosis effect from crossing Bos indicus and Bos taurus cattle: cow and calf

performance in two environments. Livest. Prod. Sci. 57: 231-241.

Belsley, D. A. 1991. Conditioning diagnostics, collinearity and weak data in regression.

1st ed. John Wiley and Sons, Inc., New York. 396pp.

Belsley, D. A., Kuh, E. and Welsch, R. E. 1980. Regression diagnostics. 1st ed. John

Wiley and Sons, Inc., New York. 320pp.

Cardoso, V. 2004. Direcionando acasalamentos para maximizar a média do valor

genotípico de uma futura safra. Ph.D. Thesis, Universidade Estadual Paulista,

Faculdade de Ciências Agrárias e Veterinárias, Campus de Jaboticabal, Brazil, 101p.

Cassady, J. P., Young, L. D. and Leymaster, K. A. 2002. Heterosis and recombination

effects on pig growth and carcass traits. J. Anim. Sci. 80: 2286-2302.

Chatterjee, S., Hadi, A. S. and Price, B. 2000. Regression analysis by example. 3rd ed.

John Wiley and Sons, Inc., New York. 359pp.

Cunningham, E. P. 1987. Crossbreeding - The Greek Temple Model. J. Anim. Breed.

Genet. 104: 2-11.

Cunningham, E. P. and Connolly, J. 1989. Efficient design of crossbreeding

experiments. Theor. Appl. Genet. 78: 381-386.

139

Delaney, N. J. and Chatterjee, S. 1986. Use of the bootstrap and cross-validation in

ridge regression. J. Bus Econ. Statist. 4: 255-262.

Demeke, S., Neser, F. W. C. and Schoeman, S. J. 2003. Early growth performance of

Bos Taurus x Bos Indicus cattle crosses in Ethiopia: Evaluation of different

crossbreeding models. J. Anim. Breed. Genet. 120: 39-50.

Dickerson, G. E. 1969. Experimental approaches in utilising breed resources. Anim.

Breed. Abstr. 37: 191-202.

Dickerson, G. E. 1973. Inbreeding and heterosis in animals. Proc. of the Animal

Breeding and Breeding Genetics Symp. in Honor of Dr. J. L. Lush. pp. 54-77. ASAS

Champaign, IL.

Draper, N. R. and Smith, H. 1998. Applied regression analysis. 3rd ed. John Wiley and

Sons, Inc., New York. 706pp.

Efron, B. 1979. Bootstrap methods: Another look at the Jackknife. Ann. Stat. 7: 1-26.

Elzo, M. A., Jara, A. and Barria, N. 2004. Genetic parameters and trends in the Chilean

multibreed dairy cattle population. J. Dairy Sci. 87: 1506-1518.

Foulley, J.L., Hanocq, E. and Boichard, D. 1992. A criterion for measuring the degree

of connectedness in linear models of genetic evaluation. Gen. Sel. Evol. 24: 315-330.

Freund, R. and Littell, R. C. 2000. SAS® System for Regression. 3rd ed. Carry, NC:

SAS Institute Inc. 236pp.

Fries, L.A. 1998. Connectability in beef cattle genetic evaluation: the heuristic approach

used in MILC.FOR. Proc. 6th World Cong. Genet. Appl. Livest. Prod., Armidale,

NSW, Australia. 27: 449-500.

140

Fries, L. A., Johnston, D. J., Hearnshaw, H. and Graser, H. U. 2000. Evidence of

epistatic effects on weaning weight in crossbreed beef cattle. Asian-Aust. J. Anim.

Sci. 13(Suppl. B): 242.

Fries, L.A., Schenkel, F. S., Roso, V. M., Brito, F. V., Severo, J. L. P. and Piccoli, M.

L. 2002. “Epistazygosity” and epistatic effects. Proc. 7th World Cong. Genet. Appl.

Livest. Prod., Montpelier, France. Communication No 17-15.

Goldstein, M. and Smith, A. F. M. 1974. Ridge-type estimators for regression analysis.

J. Roy Statist. Soc. 36: 284-291.

Gregory, K. E., Cundiff, L. V. and Koch, R. M. 1997. Composite breeds to use

heterosis and breed differences to improve efficiency of beef production. Roman L.

Hruska U. S. MARC, Clay Center, NE.

Groeneveld, E. 1990. PEST User’s Manual. Institute of Animal Husbandry and Animal

Behaviour, Federal Agricultural Research Centre, Germany.

Gruber, M. H. J. 1998. Improving efficiency by shrinkage: the James-Stein and ridge

regression estimators. 1st ed. Marcel Dekker, New York. 632pp.

Hanocq, E. and Boichard, D. 1999. Connectedness in the French Holstein cattle

population. Gen. Sel. Evol. 31: 163-176.

Hébel, P., Faivre, R., Goffinet, B. and Wallach, D. 1993. Shrinkage estimators applied

to prediction of French winter wheat yield. Biometrics. 49: 281-293.

Hoerl, A. E. 1962. Application of ridge analysis to regression problems. Eng. Progress.

58: 54-59.

Hoerl, A. E. and Kennard, R. W. 1970a. Ridge regression: Biased estimation for

nonorthogonal problems. Technometrics. 12: 55-67.

141

Hoerl, A. E. and Kennard, R. W. 1970b. Ridge regression: Application to

nonorthogonal problems. Technometrics. 12: 69-82.

Johnston, D. J., Tier, B., Graser, H. and Girard, C. 1999. Presenting BREEDPLAN

Version 4.1. Proc. Assoc. Advtm. Animal Breed. Genet. 13: 193-196.

Kennedy, B. W. and Trus, D. 1993. Considerations on genetic connectedness between

management units under an animal model. J. Anim. Sci. 71: 2341-2352.

Kinghorn, B. 1983. Genetic effects in crossbreeding. III. Epistatic loss in crossbred mice.

Z. Tierzüchtg. Züchtgsbiol. 100: 209-222.

Kinghorn, B. P. and Vercoe, P. E. 1989. The effects of using the wrong genetic model

to predict the merit of crossbred genotypes. Anim. Prod. 49: 209-216.

Klei, L., Quaas, R. L., Pollak, E. J. and Cunnigham, B. E. 2002. Multiple-breed

evaluation. Available: http://www.abc.cornell.edu.tmprols/doc1.pdf. Accessed May

12, 2004.

Koch, R. M., Dickerson, G. E., Cundiff, L. V. and Gregory, K. E. 1985. Heterosis

retained in advanced generations of crosses among Angus and Hereford cattle. J.

Anim. Sci. 60: 1117-1132.

Koots, K., Gibson, J. P. and Wilton, J. W. 1994a. Analysis of published parameter

estimates for beef production traits. 1. Heritability. Anim. Breed. Abstr. 62: 309-337.

Koots, K., Gibson, J. P. and Wilton, J. W. 1994b. Analysis of published parameter

estimates for beef production traits. 1. Phenotypic and genetic correlations. Anim.

Breed. Abstr. 62: 825-853.

Laloë, D. 1993. Precision and information in linear models of genetic evaluation. Gen.

Sel. Evol. 25: 557-576.

142

Long, C. R. 1980. Crossbreeding for beef productions: experimental results. J. Anim. Sci.

51: 1197-1223.

Lowerre, J. M. 1974. On the mean square error of parameter estimates for some biased

estimators. Technometrics. 16: 461-464.

Madsen, P. and Jensen, J. 2000. DMU – A package for analysing multivariate mixed

models. Danish Institute of Agricultural Sciences (DIAS), Denmark.

Marquardt, D. W. and Snee, R. D. 1975. Ridge regression in practice. Amer. Statist. 29:

3-20.

Mathur, P. K., Sullivan, B. P. and Chesnais, J. P. 1999. Estimation of the degree of

connectedness between herds or management groups in the Canadian swine

population. Canadian Centre for Swine Improvement, Ottawa, Canada. (Mimeo).

Mathur, P. K., Sullivan, B. P. and Chesnais, J. P. 2002. Measuring connectedness:

concept and application to a large industry program. Proc. 7th World Cong. Genet.

Appl. Livest. Prod., Montpelier, France. Communication No 20-13.

Meyer, K. 1989. Approximate accuracy of genetic evaluation under an animal model.

Livest. Prod. Sci. 21: 87-100.

Meyer, K. 1992. Variance components due to direct and maternal effects for growth traits

of Australian beef cattle. Livest. Prod. Sci. 31: 179-204.

Miller, S. P. 1996. Studies on genetic evaluation and the effect of milk yield on profit

potential in a multi-breed beef cattle population. Ph.D. Thesis, University of Guelph,

Canada. 217p.

Miller, S. P., Wilton, J. W. and Griffiths, S. J. 1995. Utilizing multi-breed genetic

evaluations in beef cattle breeding. Proc. Aust. Assoc. Anim. Breed. Genetic. 11: 254.

143

Misztal, I. and Wiggans, G. R. 1988. Approximation of prediction error variance in

large-scale animal models. J. Dairy Sci. 71(Suppl. 2): 27(Abstr.).

Piccoli, M. L., Roso, V. M., Brito, F. V., Severo, J. L. P., Schenkel, F. S. and Fries, L.

A. 2002. Additive, complementarity (additive x additive), dominance and epistatic

effects on pre-weaning gain of Hereford x Nelore calves. Proc. 7th World Cong.

Genet. Appl. Livest. Prod., Montpelier, France. Communication No 17-16.

Pimentel, E. C. G., Queiroz, S.A., Carvalheiro, R. and Fries, L. A. 2003. Efeitos da

inclusão de epistasia e complementariedade em modelos de avaliação genetica em

bovinos de corte. In.: Reunião Anual da Sociedade Brasileira de Zootectia, 40. Santa

Maria-RS, Brazil.

Pimentel, E. C. G., Cardoso, V., Carvalheiro, R., Queiroz, S. A. and Fries, L. A.

2004. Predições de desempenho de gerações avançadas conforme diferentes modelos

de avaliação de animais cruzados. In.: Reunião Anual da Sociedade Brasileira de

Zootecnia, 41. Campo Grande-MS, Brazil.

Pollak, E. J. and Quaas, R. L. 1998. Multibreed genetic evaluations of beef cattle.

Proc. 6th World Cong. Genet. Appl. Livest. Prod., Armidale, NSW, Australia. 23: 81-

88.

Robinson, D. L. 1996. Estimation and interpretation of direct and maternal genetic

parameters for weights of Australian Angus cattle. Livest. Prod. Sci. 45: 1-11.

Rodríguez-Almeida, F. A., Van Vleck, L. D. and Gregory, K. E. 1997. Estimation of

direct and maternal breed effects for prediction of expected progeny differences for

birth and weaning weights in three multibreed populations. J. Anim. Sci. 75: 1203-

1212.

144

Roso, V. M. and Fries, L. A. 1998. Maternal and individual heterozygosities and

heterosis on pre-weaning gain of Angus x Nelore calves. Proc. 6th World Cong.

Genet. Appl. Livest. Prod., Armidale, Australia.

Roso, V. M., Schenkel, F. S. and Miller, S. P. 2004. Degree of connectedness among

groups of centrally tested beef bulls. Can. J. Anim. Sci. 84: 37-47.

SAS. 1990. SAS/STAT User’s Guide (Version 6). SAS Inst. Inc., Cary, NC.

Sheridan, A. K. 1981. Crossbreeding and heterosis. Anim. Breed. Abst. 49: 131-144.

Smith, C. 1984. Rates of genetic change in farm livestock. Res. Develop. Agric. 1: 79-85.

Sorensen, D. and Gianola, D. 2002. Likelihood, bayesian, and MCMC methods in

quantitative genetics. Springer-Verlang New York, Inc. New York. 740pp.

Sullivan, P. G., Wilton, J. W., Miller, S. P. and Banks, L. R. 1999. Genetic trends and

breed overlap derived from multiple-breed genetic evaluation of beef cattle for growth

traits. J. Anim. Sci. 77: 2019-2027.

Weisberg, S. 1985. Applied linear regression. 2nd ed. John Wiley and Sons, Inc., New

York. 324pp.

Wood, C. M., Christian, L. L. and Rothschild, M. F. 1991. Evaluation of performance-

tested boars using single-trait animal model. J. Anim. Sci. 69: 3144-3155.

GENETIC EVALUATION OF MULTI-BREED BEEF …schenkel/papers.htg/Vanerlei-thesis.pdf · GENETIC...

Documents

Transcript of GENETIC EVALUATION OF MULTI-BREED BEEF …schenkel/papers.htg/Vanerlei-thesis.pdf · GENETIC...