GENETIC EVALUATION OF MULTI-BREED BEEF …schenkel/papers.htg/Vanerlei-thesis.pdf · GENETIC...
-
Upload
truongdung -
Category
Documents
-
view
219 -
download
1
Transcript of GENETIC EVALUATION OF MULTI-BREED BEEF …schenkel/papers.htg/Vanerlei-thesis.pdf · GENETIC...
GENETIC EVALUATION OF MULTI-BREED BEEF CATTLE
A Thesis
Presented to
The Faculty of Graduate Studies
of
The University of Guelph
by
VANERLEI MOZAQUATRO ROSO
In partial fulfilment of requirements
for the degree of
Doctor of Philosophy
November, 2004
© Vanerlei Mozaquatro Roso, 2004
Advisory Committee: Dr. Stephen P. Miller (Advisor)
Dr. Flávio S. Schenkel
Dr. Gary J. Umphrey
Dr. James W. Wilton
Dr. Lawrence R. Schaeffer
ABSTRACT
GENETIC EVALUATION OF MULTI-BREED BEEF CATTLE
Vanerlei Mozaquatro Roso Advisor: University of Guelph, 2004 Professor Stephen Paul Miller
Three alternative methods for measuring the degree of connectedness among test
groups (TG), including variance of estimated differences between TG effects (VED),
connectedness rating (CR), and total number of direct genetic links between TG due to
common sires and dams (GLT), which could be routinely used in genetic evaluation
programs, were evaluated. Data were consecutive weights of bulls tested in central
evaluation stations in Ontario, Canada. The Prediction error variance of differences in
estimated breeding values of bulls from different TG (PEVD) was assumed the most
adequate measure of connectedness and results from VED, CR, and GLT were compared
relative to PEVD. Average PEVD of pairs of TG can be more accurately predicted on the
basis of GLT than on the basis of either VED or CR. Average PEVD of each TG with all
other test groups can be more accurately predicted on the basis of either CR or GLT.
The GLT, which is not excessively computing demanding, was used to identify a set
of connected contemporary groups including both purebred and crossbred animals from
beef herds in Ontario. Estimates of variance components, breed additive genetic changes,
direct and maternal breed, dominance, and epistatic loss genetic effects on pre-weaning
weight gain (PWG) were obtained. Both direct and maternal dominance effects were
assumed proportional to breed heterozygosity and showed favourable effects on PWG.
Direct epistatic loss reduced the performance of the animals, whereas maternal epistatic
loss did not significantly affect the PWG. Breeds ranked similarly to what was expected,
but estimates were highly unstable, with high standard errors, possibly due to
multicollinearity, which can result in inaccurate across-breed estimated breeding values.
A framework using ridge regression methods was developed to obtain more stable
estimates of direct and maternal breed, dominance, and epistatic loss effects on PWG
when multicollinearity is of concern. Two generalized methods were applied in the choice
of the ridge parameter. Once the choice of the ridge parameter was made, its reliability
and validity were evaluated through bootstrap resampling procedures. Mean squared error
of prediction (MSEP) of both ridge regression methods were 3% lower than the MSEP
from ordinary least squares. Ridge regression methods were effective in reducing the
multicollinearity involving predictor variables of breed effects.
i
ACKNOWLEDGEMENTS
I am particularly grateful to my advisor Dr. Stephen P. Miller for giving me the
opportunity to develop my graduate studies at University of Guelph. His enthusiasm,
encouragement, guidance, and friendship during my graduate program were appreciated. I
would like to extend sincere acknowledgements to the other members of my advisory
committee, Dr. Flávio. S. Schenkel, Dr. Gary J. Umphrey, Dr. James W. Wilton, and Dr.
Lawrence R. Schaeffer for their time, advice, and contributions to the manuscript. Thanks
to Dr. Peter G. Sullivan, Dr. Luiz A. Fries, and Dr. Roberto Carvalheiro for their
suggestions.
I would like to acknowledge the faculty, staff, students, and visiting scientists at the
Department of Animal and Poultry Science for their help, kindness, and support, making
my graduate program a pleasant experience.
A special thanks to my friends Flávio, Sandra, Mariana, and Daniel, who made me
feel at home during my stay in Guelph, and to my family, for their continuous support
and love.
I am thankful to my partners at GenSys Consultores Associados in Brazil, Fernanda
V. Brito, Jorge L. P. Severo, Luiz A. Fries, and Mario L. Piccoli for their extra effort to
cover my temporary leave of absence, which allowed me to pursue a Ph.D. at the
University of Guelph.
I would like to thank Beef Improvement Ontario (BIO) for providing data and
financial support, Natural Sciences and Engineering Research Council of Canada, and
Ontario Ministry of Agriculture and Food for financial support, and the Canadian
ii
Foundation for Innovation, Ontario Innovation Trust, and Compaq for supporting the
required computing infrastructure.
iii
TABLE OF CONTENTS
1. General Introduction …………………………………………………………… 1
2. Degree of connectedness among groups of centrally tested beef bulls ………. 6
Abstract …………………………………………………………………….. 6
Introduction ………………………………………………………………… 7
Material and Methods ……………………………………………………… 9
Data ………………………………………………………………… 9
Statistical model ……………………………………………………. 9
Measures of the degree of connectedness ………………………….. 10
Prediction error variance of differences in EBV of bulls ( a )
from different test groups (PEVD) ……………………………..
10
Variance of estimated differences between test groups effects
( g ) (VED) …………………………………………………….
11
Connectedness rating (CR) …………………………………….. 11
Total number of direct genetic links between test groups (GLT) 11
Results ……………………………………………………………………… 13
Connectedness ……………………………………………………… 13
Prediction of PEVD on the basis of VED, CR, and GLT ………….. 14
Average PEVD of pairs of TG ………………………………… 16
On the basis of VED ……………………………………... 16
On the basis of CR ……………………………………….. 17
iv
On the basis of GLT………………………………………. 17
Average PEVD of each TG with all other TG ………………... 18
On the basis of VED ……………………………………... 18
On the basis of CR ……………………………………….. 19
On the basis of GLT ……………………………………… 19
Simulation of disconnected test groups ……………………………. 21
Discussion ………………………………………………………………….. 22
Conclusions ………………………………………………………….……... 26
3. Additive, dominance, and epistatic loss effects on pre-weaning gain in
crossing of different Bos taurus breeds ………………………………………..
36
Abstract …………………………………………………………………….. 36
Introduction ………………………………………………………………… 38
Material and Methods ……………………………………………………… 39
Data ………………………………………………………………….. 39
Connectedness analysis ……………………………………………... 40
Predictor variables of fixed genetic effects …………………………. 40
Breed additive effects ………………………………………….. 40
Dominance effects ……………………………………………... 41
Epistatic loss effects ……….…………………………………... 42
Genetic analysis ……………………………………………………... 43
Multi-breed additive genetic changes …………………………….…. 44
Results ……………………………………………………………………… 45
v
(Co)variance components …………………………………………… 45
Multi-breed additive genetic changes ……………………………... 46
Dominance and epistatic loss effects …….………………………….. 47
Breed additive effects …………….…………………………………. 47
Sampling correlations ……………………………………………….. 48
Discussion ………………………………………………………………….. 48
Conclusions ………………………………………………………….……... 54
4. Estimation of genetic effects in the presence of multicollinearity …………… 69
Abstract …………………………………………………………………….. 69
Introduction ………………………………………………………………… 71
Material and Methods ……………………………………………………… 72
Data ………………………………………………………………….. 72
Predictor variables of fixed genetic effects …………………………. 73
Breed additive effects ………………………………………….. 73
Dominance effects ……………………………………………... 73
Epistatic loss effects ……………………………………….…... 74
Multicollinearity diagnostics ………………………………………... 74
Variance inflation factor ………………………………………. 74
Condition index ………………………………………………... 75
Variance-decomposition proportions associated with the
eigenvalues ...…………………………………………………...
76
Genetic analysis ……………………………………………………... 77
vi
Ridge regression …………………………………………………….. 79
Objective methods for selecting the ridge parameter K …………….. 80
Generalized Ridge Estimator of Hoerl and Kennard (R1) …….. 80
Bootstrap in combination with cross-validation (R2) …………. 82
Mean squared error of prediction and variance inflation factor …….. 83
Bias measurement …………………………………………………… 84
Comparison of across-breed estimated breeding values …………….. 84
Additive-dominance models …………………………………… 84
Additive-dominance-epistatic models …………………………. 85
Results ……………………………………………………………………… 86
Multicollinearity diagnostics ……………………………………….. 86
Ridge parameter K …………………………………………………. 87
Convergence of estimates of fixed genetic effects …………………. 88
Mean squared error of prediction and variance inflation factor ….… 88
Bias measurement ………………………………………………….. 89
Dominance and epistatic loss effects ………..……………………... 90
Breed additive effects …………….………………………………… 90
Sampling correlations ……………………………………………… 92
Comparison of across-breed estimated breeding values …………… 93
Use of the same ridge parameter in subsequent genetic evaluations 95
Discussion ………………………………………………………………….. 96
Conclusions ………………………………………………………………… 102
vii
5. General Discussion ……………………………………………………………… 124
Degree of connectedness among test groups of centrally tested beef
bulls ………………………………………………………………………….
125
Practical implications ………………………………………………….. 127
Limitations and suggestions for further investigations ………………... 128
Additive, dominance, and epistatic loss effects on pre-weaning gain in
crossing of different Bos taurus breeds ……………………………………
129
Practical implications ………………………………………………….. 131
Limitations and suggestions for further investigations ………………... 132
Estimation of genetic effects in the presence of multicollinearity ………. 134
Practical implications ………………………………………………… 136
Limitations and suggestions for further investigations ………………. 137
6. References …………………………………………………………………... 138
viii
LIST OF TABLES
Table 2.1. Summary of the bull test data ……………...……………...…………….… 27
Table 2.2. Correlations among PEV of the difference between EBV of bulls from
different test groups (PEVD), variance of estimated differences between
test group effects (VED), connectedness rating (CR) and total number of
direct genetic links between test groups (GLT) for pairs of test groups
(above diagonal) and for averages of each test group with all other test
groups (bellow diagonal) ………………………………………………...
28
Table 2.3. Estimates of intercept, regression coefficients, and coefficient of
determination (R2) of the models to predict average PEVD of pairs of test
groups ……………………………………………………………………..
29
Table 2.4. Estimates of intercept, regression coefficients, and coefficient of
determination (R2) of the models to predict average PEVD of each test
group with all other test groups …………………………………………...
30
Table 3.1. Coefficients of direct (HD) and maternal (HM) dominance and direct (ED)
and maternal (EM) epistatic loss genetic effects for different mating
systems involving two breeds, A and B …………………………………..
56
ix
Table 3.2. Distribution of observations among coefficients of direct (HD) and
maternal (HM) dominance and direct (ED) and maternal (EM) epistatic loss
genetic effects …………………………………………………………….
57
Table 3.3. Mean and standard deviation (SD) of pre-weaning gain (Gain), weaning
age (Age), coefficients of direct and maternal breed additive, dominance
(HD and HM), and epistatic loss (ED and EM) genetic effects ……………..
58
Table 3.4. Estimates of (co)variance components and genetic parameters of pre-
weaning gain (kg) …………………………………………………………
59
Table 3.5. Multi-breed additive genetic changes in pre-weaning gain per year
obtained through regression of average estimated breeding values of
purebred calves on birth year (Average) and through regression of
estimated breeding values on contribution of each breed to the breed
composition of the calves (Regression) …………………………………..
60
Table 3.6. Estimates and standard errors of direct and maternal dominance (H) and
epistatic loss (E) effects on pre-weaning gain (kg) ……………………….
61
Table 3.7. Estimates (as deviations from Angus) and standard errors of direct and
maternal breed additive effects for pre-weaning gain (kg) ……………….
62
x
Table 3.8. Sampling correlations among estimates of direct (D) and maternal (M)
fixed genetic effects ...…………………………………………………….
63
Table 4.1. Correlation coefficients among predictor variables of direct (D) and
maternal (M) fixed genetic effects (n = 478,466) ………………………...
103
Table 4.2. Eigenvalues of the correlation matrix among predictor variables of fixed
genetic effects and corresponding condition indices ……………………..
105
Table 4.3. Decomposition of the variance structure of the parameter estimates
associated with the two largest condition indices ………………………...
106
Table 4.4. Values of the ridge parameter (K) obtained by ridge regression methods
R1 and R2, for direct and maternal genetic effects ……………………….
107
Table 4.5. Summary of results obtained over one hundred bootstrap samples for
ordinary least squares (LS) and ridge regression methods R1 and R2 …...
108
Table 4.6. Estimates of direct and maternal dominance (H) and epistatic loss (E)
effects on pre-weaning gain (kg), obtained by ordinary least squares (LS)
and ridge regression methods R1 and R2 …………………………………
109
Table 4.7. Estimates of direct and maternal breed additive effects on pre-weaning
xi
gain (kg), as deviations from Angus, obtained by ordinary least squares
(LS) and ridge regression methods R1 and R2 …………………………...
110
Table 4.8. Number of calves including records from 1986 to the indicated year,
expressed as equivalent purebred calves ………………………………….
111
Table 4.9. Values of the ridge parameter (K), obtained by ridge regression methods
R1 and R2, using records from 1986 to 1996 …………………………….
112
xii
LIST OF FIGURES
Figure 2.1. Average degree of connectedness for pairs of test groups (top) and for
each test group with all other test groups (bottom) on the basis of
PEVD, VED, CR, and GLT ……………………………………………..
31
Figure 2.2. Observed relationship of average PEVD per test group with number of
bulls per test group, average PEVD per test group with number of sires
per test group, CR with number of bulls per test group, and GLT with
number of bulls per test group …………………………………………...
32
Figure 2.3. Observed relationship of average PEVD of pairs of test groups with
VED, CR, and GLT .……………………………………………………..
33
Figure 2.4. Observed relationship of average PEVD of each test group with VED,
CR, and GLT ………………………………………………….…………
34
Figure 2.5. Observed relationship of average PEVD of each test group with number
of bulls per test group, VED, CR, and GLT for connected and
disconnected test groups …………………………………………………
35
Figure 3.1. Percentage of calves, sires, and dams with 1, 2, 3, or 4 breeds in the
genetic composition in the dataset containing 478,466 calves, 19,908
xiii
sires, and 234,608 dams ………………………………………………… 65
Figure 3.2. Number of purebred and crossbred calves, sires, and dams containing
some portion of the indicated breed in dataset including 478,466 calves,
19,908 sires, and 234,608 dams …………………………………………
66
Figure 3.3. Numbers of purebred and crossbred (expressed as equivalent to
purebred) calves per breed ………………………………………………
67
Figure 3.4. Multi-breed additive genetic changes in pre-weaning gain obtained
through average breeding values of purebred calves per birth year
(Average) and through regression of yearly breeding values on
contribution of each breed to the breed composition of the calves
(Regression) …..………………………………………………………….
68
Figure 4.1. Variance inflation factor (VIF) associated with predictor variables of
direct and maternal dominance (H), epistatic loss (E), and breed additive
effects ……………………………………………………… …………...
113
Figure 4.2. Convergence of the estimates of direct and maternal dominance (H),
epistatic loss (E), and breed additive effects under ridge regression
method R1 ….…...……………………………………………………….
114
xiv
Figure 4.3. Convergence of the estimates of direct and maternal dominance (H),
epistatic loss (E), and breed additive effects under ridge regression
method R2 ……………………………………………………………….
115
Figure 4.4. Variance inflation factor (VIF) associated with predictor variables of
direct and maternal dominance (H), epistatic loss (E), and breed additive
effects under ordinary least squares (LS) and ridge regressions methods
R1 and R2 …………………………………………………….………….
116
Figure 4.5. Estimates (as deviations from AN) and standard errors of direct
dominance (H), epistatic loss (E), and breed additive effects under
ordinary least squares (LS) and ridge regression methods R1 and R2 …..
117
Figure 4.6. Estimates (as deviations from AN) and standard errors of maternal
dominance (H), epistatic loss (E), and breed additive effects under
ordinary least squares (LS) and ridge regression methods R1 and R2 …..
118
Figure 4.7. Sampling correlations (multiplied by –1.0) between estimates of
maternal dominance (HM) and direct epistatic loss (ED) effects and
between estimates of direct and maternal breed additive effects given by
ordinary least squares (LS) and ridge regression methods R1 and R2 …..
119
Figure 4.8. Pearson and Spearman correlations, and percentages of coincidence for
xv
different proportions of selected (top 1%, 10%, 20%, and 40%) sires,
dams, and calves on the basis of ABC yielded by different models
compared to model ADE-R2 …………………………………………….
120
Figure 4.9. Estimates of direct and maternal dominance (H), epistatic loss (E), and
breed additive effects (as deviations from AN), under ordinary least
squares, using records from 1986 to the indicated year …………………
121
Figure 4.10. Estimates of direct and maternal dominance (H), epistatic loss (E), and
breed additive effects (as deviations from AN), under ridge regression
method R1, using records from 1986 to the indicated year (ridge
parameter K was obtained using records from 1986 to 1996) …………..
122
Figure 4.11. Estimates of direct and maternal dominance (H), epistatic loss (E), and
breed additive effects (as deviations from AN), under ridge regression
method R2, using records from 1986 to the indicated year (ridge
parameter K was obtained using records from 1986 to 1996) …………..
123
xvi
ABREVIATIONS KEY
ABC = Across-breed estimated breeding value
AN = Angus
BD = Blond D’Aquitane
BEG = Bull estimated weight gain
BLUP = Best linear unbiased predictor
CH = Charolais
CI = Condition index
CR = Connectedness rating
D = Dominance effect
E = Epistatic loss effect
EBV = Estimated breeding value
ED = Coefficient of direct epistatic loss effect
EM = Coefficient of maternal epistatic loss effect
GLT = Total number of direct genetic links between test groups
GV = Gelbvieh
HD = Coefficient of direct dominance effect
HE = Hereford
HM = Coefficient of maternal dominance effect
LM = Limousin
LS = Ordinary least squares
MA = Maine-Anjou
MSE = Mean square error
xvii
MSEP = Mean squared error of prediction
PEV = Prediction error variance
PEVD = Average prediction error variance of the difference between EBVs
R1 = Generalized ridge estimator of Hoerl and Kennard (ridge regression method R1)
R2 = Bootstrap in combination with cross-validation (ridge regression method R2)
SA = Salers
SH = Shorthorn
SM = Simmental
TG = Test group
VED = Variance of estimated differences between test group effects
VIF = Variance inflation factor
1
Chapter 1
General Introduction
Genetic selection and planned crossbreeding systems are two complementary
strategies that have been applied in the beef cattle industry to generate animals with high
levels of production and efficiency under varying management conditions and market
preferences. Programs of genetic improvement taking advantage of between-breed
additive and non-additive genetic effects are now common worldwide.
A genetic goal is effectively accomplished by selection based on modern genetic
evaluation. Considering the importance of crossbreeding in beef cattle production, genetic
evaluation must consider animals of multiple breeds. Mixed model procedures,
employing an animal model, are generally used in the genetic evaluations of multi-breed
populations. For having highly accurate genetic evaluations and consequently high
response to selection, breed additive and non-additive genetic effects must be properly
accounted for. Moreover, estimated breeding values of animals should be comparable
regardless of the breed composition and management units from which they come.
The present research focuses on some problems related to statistical methods applied
to the estimation of breeding values of animals in a multi-breed population of beef cattle,
more specifically:
(1) Estimation of the degree of connectedness among groups of centrally tested beef
bulls;
2
(2) Estimation of additive, dominance, and epistatic loss effects on pre-weaning gain
in crossing of different Bos taurus breeds; and
(3) Estimation of genetic effects in the presence of multicollinearity.
Central testing of beef bulls is an important component of genetic improvement
programs for beef cattle in many countries. Because selection is carried out across test
groups, evaluation of the degree of connectedness among test groups is of great concern.
With few genetic links between test groups, comparison of bulls’ EBV from different
groups is less accurate, even if the accuracy of the EBV are high within the groups
(Kennedy and Trus, 1993).
Different criteria for measuring connectedness have been proposed in the literature
(e.g., Wood et al., 1991; Folley et al., 1992; Laloë, 1993; Kennedy and Trus, 1993; Fries,
1998; Hanocq and Boichard, 1999; Mathur et al., 2002). Ideally, PEV of comparisons
between animals or average PEV of comparisons between groups of animals (PEVD),
which is influenced by the average genetic relationship between and within management
units, should be the basis for measuring connectedness (Kennedy and Trus, 1993).
However, computing the PEV matrix is very difficult or impossible for large datasets. If
obtaining a measure of connectedness through PEVD is impossible, alternative methods
could be used to predict PEVD. In Chapter 2, three alternative methods are assessed and
compared with respect to prediction of PEVD. Models to predict PEVD, which could be
routinely used in genetic evaluation, are suggested. An indication of the degree of
connectedness among test groups of beef bulls in Ontario, Canada, is obtained. Results
from this investigation will be the basis for developing recommendations to increase the
accuracy of comparisons of bulls across test groups.
3
Across herd genetic evaluations for growth traits is another significant component of
genetic improvement programs for beef cattle in many countries. Similar to genetic
evaluations of centrally tested beef bulls, across herd genetic evaluations for growth traits
are based on additive-dominance genetic models. These models are justified based on the
assumption that heterosis is mainly due to dominance effects, in agreement with results
obtained by Gregory et al. (1997) in a large beef cattle crossbreeding experiment.
Heterosis is modeled as being proportional to the probability that genes at a locus come
from different breeds, which corresponds to the breed heterozygosity. Deviations from
the linear association of heterosis with degree of heterozygosity are due to recombination
loss (Dickerson, 1969, 1973). Recombination loss (epistatic loss) is attributed to the loss
of favourable epistatic combinations present in the gametes from purebreds as a result of
long-term selection. This loss is proportional to the probability that two non-allelic genes
randomly chosen in the individual are from different breeds. Because it is difficult to
estimate dominance and epistatic loss effects separately, research studies to estimate both
dominance and epistatic loss effects in beef cattle are not abundant, particularly with field
data. However, results obtained by Arthur et al. (1999) suggest that, when data structure
allows, the inclusion of epistatic effects in the genetic evaluation model can significantly
improve the accuracy of predictions.
Estimates of (co)variance components, heterosis, breed effects, and additive genetic
changes have been obtained in Ontario (Miller, 1996; Sullivan et al., 1999), but there
were no available studies which separated direct and maternal dominance and epistatic
loss effects associated with breed heterozygosities. An objective reported in Chapter 3
was to obtain estimates of direct and maternal breed additive, dominance, and epistatic
loss effects for pre-weaning gain weight. (Co)variance components were also obtained
4
and breed additive genetic changes between 1986 and 1999 were examined. Estimates
obtained in this study can be used to update the parameters currently used in the genetic
evaluations to improve accuracy.
For fitting breed additive, dominance, and epistatic loss effects, a multiple regression
equation including predictor variables such as breed compositions and breed
heterozygosities, and functions of the heterozygosities can be used. This has been
generally done by ordinary least squares methods. The interpretation of the estimates
given by ordinary least squares depends on the assumption that predictor variables are not
strongly interrelated. If the vectors of predictor variables are multicollinear, the least
square estimates typically have large standard errors, may have signs that are opposite to
what would be expected, and are sensitive to changes in the data file and to addition or
deletion of variables in the model, making modeling very confusing. Moreover, when
taken in combination, the estimated coefficients often cancel out, indicating confounding.
In the presence of multicollinearity, the least squares estimator is not adequate because it
will be very unstable. Multicollinearity has been indicated as one of the main causes of
unexpected signs and high degree of confounding involving estimates of direct and
maternal breed additive and/or non-additive genetic effects (e.g., Kinghorn and Vercoe,
1989; Rodríguez-Almeida et al., 1997; Fries et al., 2000; Cassady et al., 2002), which can
lead to the incorrect ranking of animals based on across breed comparisons.
For overcoming difficulties caused by multicollinearity, Hoerl and Kennard (1970a,
1970b) suggested the use of the ridge regression estimator. With a suitable choice of the
ridge parameter, the ridge regression estimator gives a more precise estimate of
regression coefficients because its variance and mean squared error are smaller than those
of the least squares estimator. The fact that ridge regression estimators have been
5
successfully applied in dealing with multicollinearity in diverse fields, including
Chemistry, Econometrics, and Engineering (Gruber, 1998) suggests avenues for research
and application in the context of animal breeding, particularly in the analysis of multi-
breed populations of beef cattle.
Chapter 4 presents the development of a framework, using ridge regression methods,
for obtaining stable estimates of direct and maternal breed additive, dominance, and
epistatic loss effects on pre-weaning gain when multicollinearity is of concern, which
could contribute to more accurate multi-breed genetic evaluation of beef cattle. After
identifying the causes of dependencies among predictor variables, two generalized ridge
regression methods were applied in the choice of the ridge parameter. Once the choice of
the ridge parameter was made, its reliability and validity were evaluated through
bootstrap resampling procedures in combination with cross-validation. Finally, some
results obtained with ridge regression methods were examined to further illustrate
application of ridge regression in routine large-scale genetic evaluations.
The final chapter is a general discussion of results obtained in the previous chapters.
Some practical implications of the results of this study, limitations, and suggestions for
future research are presented.
6
Chapter 2
Degree of connectedness among groups of
centrally tested beef bulls
V. M. Roso, F. S. Schenkel, and S. P. Miller
Published in Canadian Journal of Animal Science 2004 84: 37-47
Reproduced by permission of the Agricultural Institute of Canada
ABSTRACT - The degree of connectedness among test groups (TG) of bulls tested in
central evaluation stations from 1988 to 2000 in Ontario, Canada, was evaluated using the
methods PEVD, VED, CR, and GLT. The model used in the analysis included the effects
of breed and TG (fixed) and animal (random). PEVD was assumed the most adequate
measure of connectedness and results from the alternative methods VED, CR, and GLT
were compared relative to PEVD. Models to predict the average PEVD of pairs of TG
and the average PEVD of each TG with all other TG on the basis of VED, CR, and GLT
were developed. Results from all measures of connectedness indicated an unfavourable
trend in the degree of connectedness after 1994. The average PEVD of pairs of TG can be
7
better predicted on the basis of the model that includes GLT. The average PEVD of each
TG with all other TG can be better predicted on the basis of models that include either CR
or GLT. Connectedness among TG of centrally tested beef bulls can be adequately
assessed for specific pairs of TG or overall for each TG with all other TG using GLT.
Key words: accuracy, central test, genetic evaluation, harmonic mean
Abbreviations: BEG, bull estimated weight gain; CR, connectedness rating; EBV,
estimated breeding value; VED, variance of estimated differences between test group
effects; GLT, total number of direct genetic links between test groups; PEV, prediction
error variance; PEVD, average prediction error variance of the difference between
estimated breeding values; TG, test group.
INTRODUCTION
Connectedness among test groups (TG) is of interest in genetic evaluation of station-
tested beef bulls because comparisons of estimated breeding values (EBV) of bulls tested
in different groups are made. The EBV of bulls from different TG are comparable due to
use of appropriate methodology (Best Linear Unbiased Predictor, BLUP) and genetic
connectedness among groups. However, the accuracy of the comparisons depends upon
the degree of connectedness among TG. With lower connectedness between TG,
comparison of bulls’ EBV from different TG is less accurate, even if the accuracy of EBV
is high within the groups (Kennedy and Trus, 1993).
When genetic evaluation is under an animal model, connections occur through
additive genetic relationships. Hence, two TG could be connected by direct and/or
8
indirect genetic links. Kennedy and Trus (1993) argued that the most appropriate measure
of connectedness is the average prediction error variance of differences (PEVD) in EBV
between animals in different management units (e.g., TG), which is influenced by the
average genetic relationship between and within management units. However, computing
this statistic is extremely time consuming and not feasible for routine application.
When PEVD cannot be computed, Kennedy and Trus (1993) proposed to use the
variance of estimated differences between management unit effects (VED), which was
highly correlated with PEVD in their simulation study. Mathur et al. (1999) also
suggested that VED could be used as a measure of connectedness between two
management units and proposed to calculate the connectedness rating (CR), defined as
the correlation between estimated effects of two management units. Following Mathur et
al. (1999), CR is less dependent on the size and structure of management units than VED.
For calculating CR, the authors proposed an iterative method, which captures the inverse
elements for some rows and columns (corresponding to TG in the mixed model
equations, for example) of any large matrix for which a direct inverse is not possible.
Fries (1998) proposed the use of number of direct genetic links between TG (GLT) due to
common sires and dams as a method for measuring degree of connectedness among TG.
The objectives of this study were:
(1) To obtain an indication of the degree of connectedness of test groups of beef bulls
in Ontario,
(2) To assess and compare the methods VED, CR and GLT for measuring the degree
of connectedness among groups of station-tested beef bulls, and
9
(3) To define a model to predict the PEVD of pairs of test groups and the average
PEVD of each TG with all other TG, which could be routinely used in genetic evaluation
programs.
MATERIAL AND METHODS
Data
Data were consecutive weights of bulls tested in central evaluation stations in
Ontario, Canada, from 1988 to 2000. Bulls from multiple breeds and crossbreds, from
different herds, were delivered to test stations and submitted to an adjustment period of
28 days before start of test. Bulls were weighed every 28 days during a period of 112 or
140 days on test. A summary of the data is presented in Table 2.1.
Statistical model
Consecutive weights of bulls were used to obtain the estimated weight gain (BEG). A
fixed univariate linear regression of the weight (wij) on days on test (dij) for each bull i
was estimated, using the model wij = αi + βidij + eij, where αi and βi are the intercept and
linear regression coefficient of the ith bull, respectively, and eij is the random residual
term. The BEG was calculated multiplying βi by the number of days on test (140 days)
and adjusted for heterosis on the basis of individual bull’s heterozygosity. An ad hoc
heterosis of 3% was assumed for an animal with heterozygosity of 100%, regardless of
the breeds involved (Sullivan et al., 1999). Then, BEG was used as an observation in the
follow genetic evaluation model:
10
ijijjik
14
1=kkij e+a+g+Bb=BEG ,
where
BEGij is the estimated weight gain of the ith bull in the jth TG;
bk is the linear regression coefficient on the breed composition for the kth
breed;
Bik is the contribution of the kth breed to the breed composition of the ith bull;
gj is the fixed effect of the jth TG;
aij is the random additive genetic effect of the ith
bull in the jth TG;
eij is the random residual effect.
Random effects a and e were assumed independent with covariance matrices equal to
Aσ2a and Iσ2
e, respectively. All available pedigree information was incorporated into the
additive numerator relationship matrix A. The required elements for calculating VED, CR
and PEVD were obtained using PEST (Groeneveld, 1990), assuming ad hoc heritability
of 0.43 (Sullivan et al., 1999), which was previously estimated for the same data set.
Measures of the degree of connectedness
The degree of connectedness among TG was measured using the following methods:
(1) Prediction error variance of differences in EBV of bulls ( a ) from different test
groups (PEVD). The PEVD of two animals, one from the ith and other from the jth TG
was given by
)a–a,a–a(cov2–)a–a(var+)a–a(var=PEVD jjiijjiiij .
11
(2) Variance of estimated differences between test group effects ( g ) (VED). The
VED between the ith and the jth TG was given by
)g,g(cov2–)g(var+)g(var=VED jijiij .
(3) Connectedness rating (CR), defined as the correlation between estimated effects
of TG (Mathur et al. 2002). The CR between the ith and the jth TG was given by
100×)g(var)g(var
)g,g(cov=CR
ji
jiij .
(4) Total number of direct genetic links between test groups (GLT), defined as the
links between TG due to common sires and dams (Fries, 1998). The basic steps of the
algorithm and the criteria used for computing GLT are:
1. Calculate the number of direct genetic links between pairs of TG due to common
sires and dams. Then, for each TG, calculate the overall number of genetic links
due to sires (GLs) and dams (GLd) with all other TG.
2. Calculate the total number of genetic links (GLT) as the sum of GLs and GLd.
3. Identify the TG with the largest GLT (“main TG”).
4. Identify all TG direct and/or indirectly connected to “main TG”. These groups
constitute the “principal mass”. TG with less than 10 GLT and/or less than three
different parents (sires + dams) were considered disconnected to “principal mass”
and have their GLT zeroed. Other criteria could be used.
5. Repeat step 4 until the connected TG remain the same as at previous run.
12
6. Save records that were considered as connected to the “principal mass”. TG
disconnected to the “principal mass” have GLT equal to zero and should be rerun
through the program. This procedure allows identification of isolated subsets of
connected TG.
The average PEVD was assumed as the basic measure of connectedness of a TG,
following Kennedy and Trus (1993). This statistic was considered the most appropriate
measure of connectedness and the alternative methods VED, CR, and GLT were
compared relative to PEVD. The degree of connectedness was calculated for pairs of TG
and for each TG with all other TG. Connectedness between pairs of TG indicates
accuracy in comparing EBV of animals from two TG. Average connectedness of each TG
with all others indicates the average accuracy in comparing EBV of an animal with
animals in all other TG. This measure is of greater importance in the evaluation of beef
bulls in a station test because selection generally considers all TG instead of a few very
well connected TG. High average connectedness of each TG with all other TG allows
effective selection across all TG.
As previously indicated, the GLT of a TG is the number of direct genetic links of the
TG with all other TG. Obviously many pairs of TG that do not have any direct genetic
links are indirectly connected and, consequently, can have high accuracy of comparisons
of EBV between them. For this reason, the number of direct genetic links between pairs
of TG is inadequate to indicate the degree of connectedness between pairs of TG. The
arithmetic mean of GLT of each pair of TG is also inadequate because pairs of TG with
equal arithmetic mean can have very different degrees of connectedness. A potentially
adequate measure of connectedness between pairs of TG could be obtained through the
harmonic mean of the GLT. This measure has the property of discriminating among pairs
13
of TG with different GLT, penalizing those expected to be more poorly connected. As a
consequence, better relationship between PEVD with harmonic means than with
arithmetic means of GLT may be expected. The harmonic mean of GLT was used in the
prediction of average PEVD of pairs of TG. The harmonic mean of GLT of TG i and TG j
(GLTij) was given by
ji
ij
GLT1
+GLT
12
=GLT ,
where GLTi and GLTj are the GLT of the ith and jth TG with all other TG, respectively.
The harmonic mean is always smaller than the arithmetic mean unless the GLT of the
two TG are identical. When the GLT of a TG was equal to zero, which means the TG is
not connected to the “principal mass”, a harmonic mean equal to zero was assumed.
The statistical analyses to define the models for predicting PEVD were performed
using the general linear models procedure (GLM) of the SAS statistical software (SAS
Institute Inc., 1990). The R2 of the models and the level of significance (P < 0.05) of each
effect considered were the criteria used to determine the final models. When segmented
polynomial regressions were used, the knots (junction points between segments) were
determined based on maximization of R2 of the model.
RESULTS
Connectedness
The average value of degree of connectedness among TG using PEVD and the
alternative measures VED, CR, and GLT were 1599 ± 58, 286 ± 132, 1.23 ± 1.28, and
707 ± 503 for pairs of TG and 1726 ± 41, 286 ± 93, 1.21 ± 0.51 and 709 ± 690 for each
14
TG with all other TG, respectively. The overall results over the years are depicted in
Figure 2.1. Small values of PEVD and VED, and large values of CR and GLT are
desirable, because they indicate higher levels of connectedness among TG. All measures
of connectedness showed the same trend, that is, an increase in the degree of
connectedness from 1988 to 1994 and a substantial decrease after 1994. The highest
PEVD and VED, and the smallest CR and GLT were observed in 2000 (last year with
available information at the time of this research).
Prediction of PEVD on the basis of VED, CR, and GLT
Correlations among PEVD, VED, CR and GLT for pairs of TG and for averages of
TG with all other TG are presented in Table 2.2. In general the correlations had moderate
to high magnitude. The correlation between PEVD and VED was 0.71 both for pairs of
TG and average per TG, in contrast with the almost perfect correlation obtained by
Kennedy and Trus (1993) in their simulation study. The coefficient of correlation
measures only the strength of the linear relationship between two variables. Because a
better indication of the true relationship of PEVD with the alternative methods was
needed for defining the models to predict PEVD, the observed relationship between
PEVD and the other variables were graphically analyzed. The relationship of PEVD with
both number of bulls and number of sires per TG was also analyzed.
As shown in Figure 2.2, the observed relationship of average PEVD per TG with both
number of bulls and number of sires per TG had the same pattern. By observation, TG
with more than approximately 40 bulls or 20 sires were associated with values of PEVD
smaller than 1750, otherwise TG showed large variation in PEVD. The variation depends
on the genetic relationship between groups, which is not a direct function of number of
15
bulls or number of sires per TG. Because TG with a small number of bulls or a small
number of sires showed large variation in PEVD, which indicate large variation in the
degree of connectedness of these groups, neither number of bulls (size of the group) nor
number of sires per TG are good predictors of the degree of connectedness between TG.
Figure 2.2 shows also the relationship of both CR and GLT with number of bulls per
TG. Although a large variation in the degree of connectedness was indicated by PEVD
when the size of TG was small, CR was strongly associated with number of bulls per TG
over the whole range of TG size. CR decreased linearly when the size of TG became
smaller than approximately 40 bulls. Mathur et al. (2002) reported a similar trend in the
application of CR for measuring connectedness in the Canadian Centre for Swine
Improvement. VED seemed to be even more dependent on the size of TG than CR, where
TG with less than 25 bulls were associated with increasingly higher VED (data not
shown). On the contrary, GLT showed large variation across the range of TG sizes
(Figure 2.2). Even small TG had large GLT, which could result in these TG having high
accuracy of comparisons.
The observed relationships of PEVD with VED, CR and GLT are depicted in Figure
2.3 for pairs of TG and in Figure 2.4 for the average of each TG with all other TG. In
both cases, PEVD and VED were linearly, but not strongly, associated. On the other
hand, the relationships of both CR and GLT with PEVD were curvilinear. When GLT of
pairs of TG were represented by their arithmetic mean, large variation in PEVD was
observed at the same level of GLT (Figure 2.3). However, when GLT of pairs of TG were
represented by their harmonic mean, a stronger relationship with PEVD was observed.
Therefore, in the prediction of PEVD of pairs of TG, superior results can be expected
using harmonic mean instead of arithmetic mean of GLT. Figures 2.3 and 2.4 also
16
indicate that averages of CR smaller than approximately one and GLT smaller than
approximately 250 per TG were associated with increasingly higher PEVD.
The information provided by the correlations and graphical analyses were explored to
define the models for predicting PEVD. Initially, VED, CR, GLT, number of bulls per
TG, number of sires per TG, and the ratio of number of bulls per sire per TG were
considered. In the final models, however, only those with significant effect (P < 0.05)
were kept.
The final models to predict the average PEVD of pairs of TG and the average PEVD
of each TG with all other TG based on VED, CR, and GLT were the following:
(1) Average PEVD of pairs of TG
(1a) On the basis of VED
The observed average PEVD of pairs of TG was modeled by a linear regression on
VED and a quadratic regression on the ratio of harmonic means of number of bulls and
number of sires of pairs of TG.
PEVDij = α + β1 VEDij + β2 (NB/S)ij + β3 (NB/S)ij2 + eij,
where
PEVDij is the observation of the average PEV of the difference between EBV of bulls
in the ith TG with EBV of bulls in the jth TG;
α is the intercept;
VEDij is the variance of estimated differences between the ith and the jth TG;
(NB/S)ij is the ratio of harmonic means of number of bulls and number of sires in the
ith and jth TG;
17
β1, β2 and β3 are the regression coefficients;
eij is the residual associated with PEVD of the ith and jth TG.
(1b) On the basis of CR
The observed average PEVD of pairs of TG was modeled using a quadratic-quadratic
polynomial regression on CR and a quadratic regression on the ratio of harmonic means
of number of bulls and number of sires of pairs of TG.
PEVDij = α + β1 CRij + β2 CRij2 + β3 Z + β4 (NB/S)ij + β5 (NB/S)ij
2 + eij,
where
α is the intercept;
CRij is the connectedness rating between the ith and the jth TG;
Z = 0 if CR < 1.9 or Z = (CR – 1.9)2 otherwise;
(NB/S)ij is the ratio of harmonic means of number of bulls and number of sires in the
ith and jth TG;
β1, β2, β3, β4, β4, and β5 are the regression coefficients;
eij is the residual associated with PEVD of the ith and jth TG.
(1c) On the basis of GLT
The observed average PEVD of pairs of TG was modeled using a quadratic-quadratic
polynomial regression on the harmonic mean of GLT of pairs of TG and a quadratic
regression on the ratio of harmonic means of number of bulls and number of sires of pairs
of TG.
PEVDij = α + β1 GLT ij + β2 GLT ij2 + β3 Z + β4 (NB/S)ij + β5 (NB/S)ij
2 + eij,
18
where
α is the intercept;
GLT ij is the harmonic mean of the GLT of the ith and the jth TG;
Z = 0 if GLT < 550 or Z = (GLT – 550)2 otherwise;
(NB/S)ij is the ratio of harmonic means of number of bulls and number of sires in the
ith and jth TG;
β1, β2, β3, β4, β4, and β5 are the regression coefficients;
eij is the residual associated with PEVD of the ith and jth TG.
(2) Average PEVD of each TG with all other TG
(2a) On the basis of VED
The observed average PEVD of each TG with all other TG was modeled by a linear
regression on VED and a quadratic regression on number of sires per TG.
PEVdi = α + β1 VEDi + β2 S + β3 S2 + ei,
where
α is the intercept;
PEVDi is the observation of the average PEV of the difference between EBV of bulls
in the ith TG with EBV of bulls in all other TG;
VEDi is the average variance of estimated differences between the ith TG and all other
TG;
S is the number of sires represented in the ith TG;
β1, β2 and β3 are the regression coefficients;
ei is the residual associated with PEVD of the ith TG.
19
(2b) On the basis of CR
The observed average PEVD of each TG with all other TG was modeled using a
quadratic-quadratic polynomial regression on CR, a quadratic regression on number of
sires, and a quadratic regression on the ratio of number of bulls per sire.
PEVDi = α + β1 CRi + β2 CRi2 + β3 Z + β4 Si + β5 Si
2 + β6 (NB/S)i + β7 (NB/S)i2 + ei,
where
α is the intercept;
CRi is the average connectedness rating of the ith TG with all other TG;
Z = 0 if CR < 1.15 or Z = (CR – 1.15)2 otherwise;
S is the number of sires represented in the ith TG;
(NB/S)i is the average ratio of number of bulls per sire represented in the ith TG;
β1, β2 ,β3 , β4 , β5, β6 and β7 are the regression coefficients;
ei is the residual associated with PEVD of the ith TG.
(2c) On the basis of GLT
The observed average PEVD of each TG with all other TG was modeled using a
quadratic-quadratic-quadratic polynomial regression on GLT, a linear regression on
number of sires and a quadratic regression on the ratio of number of bulls per sire.
PEVDi = α + β1 GLT i + β2 GLT i2 + β3 Z1 + β4 Z2 + β5 Si + β6 (NB/S)i + β7 (NB/S)i
2 + ei,
where
α is the intercept;
GLT i is the total number of direct genetic links between the ith TG and all other TG;
20
Z1 = 0 if GLT < 200 or Z1 = (GLT – 200)2 otherwise;
Z2 = 0 if GLT < 800 or Z2 = (GLT – 800)2 otherwise.
S is the number of sires represented in the ith TG;
(NB/S)i is the average ratio of number of bulls per sire represented in the ith TG;
β1, β2 ,β3 , β4 , β5, β6 and β7 are the regression coefficients;
ei is the residual associated with PEVD of the ith TG.
Estimates of parameters and coefficient of determination (R2) of the models are
presented in Table 2.3 for prediction of average PEVD of pairs of TG and in Table 2.4 for
prediction of average PEVD of each TG with all other TG. The R2 of the models to
predict average PEVD of each TG were higher than the R2 of the models to predict PEVD
of pairs of TG on the basis of VED, CR and GLT. These results were expected because
extreme values observed in the pairwise comparisons were averaged out, reducing the
variation on PEVD.
The R2 of the model to predict average PEVD of pairs of TG on the basis of VED was
equal to 0.53 and VED accounted for 51% (partial R2) of the total variation in PEVD. In
the model to predict average PEVD of pairs of TG on the basis of CR, the R2 was equal to
0.50 and CR accounted for 49% of total variation in PEVD. R2 of 0.72 was obtained in
the model that considered GLT, which accounted for 71% of the total variation in average
PEVD (Table 2.3).
In the models to predict average PEVD of each TG with all other TG, the R2 of the
model based on VED was equal to 0.55 and VED accounted for 50% of the total variation
in PEVD. In the model to predict PEVD on the basis of CR, the R2 was equal to 0.82 and
21
CR accounted for 73% of total variation in PEVD. R2 of 0.79 was obtained in the model
that considered GLT, which accounted for 76% of the total variation in PEVD (Table
2.4). The R2 increased to 0.82 when GLT also included the genetic links due to
grandparents (data not shown).
Simulation of disconnected test groups
In the data set, on the basis of GLT, there was only one completely disconnected TG.
Thus, to evaluate the effect of complete disconnectedness, 36 TG had sire and dam
identifications modified to generate completely disconnected TG, covering a range of TG
sizes from very small to large (6 to 183 bulls).
Because there were no relationships among bulls within the created disconnected TG,
accuracy of bull EBV from disconnected TG would increase only with the size of the
group. Figure 2.5 shows that increasing the size of disconnected groups reduced the
average PEVD of each TG with all other TG from 1950, in a group with only 6 bulls, to
an asymptotical minimum value around 1850, when 120 bulls were in the TG. Kennedy
and Trus (1993) showed that relationships among bulls within disconnected TG would
increase the PEV of comparisons of EBV across TG. Therefore, connected TG with
average PEVD greater than or equal to 1850 would behave similarly to large
disconnected TG of unrelated bulls with respect to PEVD.
Disconnected TG were easily identified through GLT because it was equal to zero.
However, the VED and CR of those disconnected TG varied between 164 and 739 and
between 0.27 and 1.10, respectively (Figure 2.5). Therefore, completely disconnected TG
presented a large range of VED and CR values and cannot be distinguished from
connected TG.
22
DISCUSSION
The genetic evaluation of bulls tested in central evaluation stations in Ontario,
Canada, is currently performed using an individual animal model. With such a model,
connections among TG occur through additive genetic relationships. Accurate
comparison of estimated breeding values between animals in different groups is necessary
to provide reliable ranking of animals across TG. The accuracy of comparison between
animals in different TG is higher if groups are well connected.
For a bull test station to operate in Ontario some requirements based on minimal
number of bulls (12) and minimal number of sires (4) per TG are observed. Nevertheless,
results of the current study have shown that these requirements were not sufficient to
maintain a high level of connectedness among TG.
Kennedy and Trus (1993) stated that PEV of comparisons between animals or average
PEV of comparisons between groups of animals (PEVD) should be the basis of the
measurement of connectedness. However, computing the PEV matrix is very difficult or
impossible for large data sets. Approximate methods for obtaining diagonal elements of
the PEV matrix of large data sets have been developed (Misztal and Wiggans, 1988;
Meyer, 1989), but they generally do not provide the required off-diagonal elements to
obtain PEVD. If obtaining a measure of connectedness through PEVD is not possible,
alternative methods could be used to predict PEVD and, consequently, provide a measure
of degree of connectedness among management units.
Different criteria for measuring connectedness have been proposed in the literature.
Wood et al. (1991) compared the effectiveness of different breeding programs for
evaluation of pigs in test stations, using only the diagonal elements of the PEV matrix to
measure connectedness. Foulley et al. (1992) proposed calculating the ratio of the
23
determinants of PEV matrices with and without management unit (e.g., TG) in the model.
Laloë (1993) extended the concept of individual coefficient of determination for
measuring the overall precision of a genetic evaluation using linear mixed model
methodology. However, the use of such criteria becomes impossible if the analysis
involves a large number of animals. In this case, approximations or simplifications
similar to those presented by Foulley et al. (1992) were suggested. The concept of
coefficient of determination was also used by Hanocq and Boichard (1999) for measuring
connectedness among breeding studs in the French Holstein cattle population. However,
none of these measurements of connectedness were feasible for implementation in very
large-scale genetic evaluation.
In the current investigation three alternative measures of connectedness (VED, CR,
and GLT) were studied and used in models to predict PEVD. Models with CR and GLT
produced better results than the model with VED in the prediction of average PEVD of
each TG with all other TG, explaining high proportions of the total variance in PEVD.
Comparing the partial coefficient of determination, GLT accounted for a higher
proportion of PEVD variability than VED and CR. The effect of number of sires per TG
and ratio of number of bulls per sire had a small impact on PEVD. In the prediction of
average PEVD of pairs of TG, GLT showed large superiority comparatively to VED and
CR.
The total number of genetic links between TG were mainly (94.5%) due to common
sires. Additional analysis, on which GLT considered also the genetic links due to
common grandparents, showed a small increment (3%) in the R2 of the model to predict
average PEVD of each TG on the basis of GLT. These results suggested that the most
important relationships were accounted for via common sires and dams among TG, in
24
agreement with Hanocq and Boichard (1999). For considering other generations in the
calculation of GLT, the extra computational cost versus the increase in the accuracy of
prediction of PEVD must be evaluated. In the present study the direct genetic links due to
common sires and dams were enough to provide a sufficiently accurate prediction of
PEVD and the increased accuracy of comparisons generated through additional
generations did not compensate the increased computing cost.
When completely disconnected TG were simulated, VED, CR, and GLT showed a
different pattern. On the basis of VED and CR, it was not possible to differentiate
completely disconnected TG from connected ones, because large disconnected TG had
VED and CR values that overlapped those from connected TG. Meanwhile, EBV of bulls
from completely disconnected TG should not be compared with EBV of bulls from other
TG, except when it is possible to assume that genetic levels among management units are
identical. In general, this strong assumption does not hold in industry wide genetic
evaluation.
Although VED and CR are computationally less demanding than PEVD, the effort to
calculate these statistics is still substantial, which can jeopardize the application of these
methods if a very large number of TG were involved. Because GLT is less demanding, it
could be easily routinely calculated.
The use of GLT allowed the identification of disconnected TG (without genetic links).
Hence, an assessment of the quality of the connectedness of a TG could potentially be
obtained before beginning an evaluation by calculating GLT. GLT is less dependent on
the size of TG and would not necessarily favour a large TG, because relatively small TG
may have large GLT and, consequently, low average PEVD.
25
In this study the degree of connectedness among TG was evaluated using PEVD,
VED, CR and GLT. Results obtained by all measures of connectedness indicated that TG
are becoming less connected and, consequently, the accuracy of comparisons of EBV of
bulls in different TG is decreasing. The period after 1994 was markedly poorer with
regard to connectedness, reaching the worst level in 2000 (last year evaluated). The
beginning of this period coincides with a significant change in the structure of bull testing
in Ontario, when larger stations running under contract with the Ontario Ministry of
Agriculture and Food were replaced with private groups. These private groups commonly
represent fewer herds and they tend to be smaller and less connected than their contract
predecessors.
From the predicted PEVD on the basis of VED, CR, and GLT it is possible to
anticipate that increasing the values of VED and decreasing the values of CR and GLT in
relation to those observed in 2000, would cause a reduction in the accuracy of
comparisons and, consequently, potential genetic gain would be compromised. For
modifying the current trend with regard to connectedness and increase the accuracy of
comparisons, recommendations must be developed. Increasing the use of common sires
with high genetic values can increase connectedness among TG, besides promoting
genetic improvement among herds. In addition, GLT could be rapidly determined when
groups of bulls are formed and decisions could be made to increase the number of genetic
links among TG, allowing accurate comparison of EBV across TG.
Kennedy and Trus (1993) showed that connectedness increases with relationship
across groups, while it decreases when the within group relationship increases. Similar
results were observed by Hanocq and Boichard (1999). The increase of genetic
connectedness among TG reduces PEV of comparison of animals in different TG.
26
However, according to Kennedy and Trus (1993), “minimization of PEV does not
necessarily maximize rate of genetic improvement because it may come at a cost of
reduced intensity of selection associated with selection among related as opposed to
unrelated individuals”. Therefore, to maximize genetic gain, equilibrium between
connectedness and intensity of selection should be attained.
The methods for measuring connectedness evaluated in the current investigation are
dependent on the particular structure of the data. Further studies using other test bull data
sets with different structures are warranted.
CONCLUSIONS
The current trend in the accuracy of comparisons of bulls tested in different test
groups in Ontario is not favourable. All measures of connectedness studied showed a
decrease in the degree of connectedness among test groups after 1994.
Average PEVD of pairs of test groups can be more accurately predicted on the basis
of the model that includes GLT than on the basis of models that include VED or CR.
Average PEVD of each test group with all other TG can be more accurately predicted on
the basis of models that include either CR or GLT.
GLT is not excessively computing demanding and allows differentiation between
completely disconnected test groups from connected ones. For these reasons, GLT seems
to be a good alternative to be routinely used for measuring the degree of connectedness
among test groups with the aim of improving the accuracy of comparison of bulls’ EBV
across test groups in central evaluation stations.
27
Table 2.1. Summary of the bull test data
Number of bulls 26,068
Number of animals in the pedigree 58,826
Number of test groups 583
Number of breeds 14
Number of purebred bulls 23,279
Number of crossbred bulls 2,789
Number of bulls per test group a 45 ± 36
Number of sires per test group 23 ± 21
Number of test groups per year 45 ± 10
Average starting age (days) 240 ± 23
Average BEG (kg) b 238 ± 37
a Average ± standard deviation.
b Bull estimated weight gain.
28
Table 2.2. Correlations among PEV of the difference between EBV of bulls from
different test groups (PEVD), variance of estimated differences between test group
effects (VED), connectedness rating (CR) and total number of direct genetic links
between test groups (GLT) for pairs of test groups (above diagonal) and for
averages of each test group with all other test groups (bellow diagonal)
PEVD VED CR GLT
PEVD - 0.71 –0.45 –0.71
VED 0.71 - –0.51 –0.68
CR –0.70 –0.85 - 0.55
GLT –0.66 –0.64 0.86 -
29
Table 2.3. Estimates of intercept, regression coefficients, and coefficient of
determination (R2) of the models to predict average PEVD of pairs of test groups
On the basis of VED On the basis of CR On the basis of GLT
Intercept 1452.3404 ± 0.6481 Intercept 1747.2603 ± 0.6727 Intercept 1698.5458 ± 0.4365
VED a 0.4036 ± 0.0012 CR –181.3726 ± 0.5776 GLT –0.4745 ± 0.0013
NB/S 2.6160 ± 0.0319 CR2 46.8462 ± 0.1801 GLT2 0.0004 ± 0.0000
(NB/S)2 –0.0271 ± 0.0004 Z –46.7829 ± 0.1845 Z –0.0004 ± 0.0000
- - NB/S –12.9617 ± 0.3354 NB/S 6.7573 ± 0.2472
- - (NB/S)2 1.1632 ± 0.0473 (NB/S)2 –0.4526 ± 0.0351
R2 0.53 R2 0.50 R2 0.72
R2 b 0.51 R2 b 0.49 R2 b 0.71
P < 0.0001 for all parameters.
a VED = variance of estimated differences between pairs of test groups.
NB/S = ratio of harmonic means of number of bulls and number of sires for pairs of test
groups.
CR = connectedness rating between pairs of test groups.
GLT = harmonic mean of total number of direct genetic links of pairs of test groups.
Z = knot (junction point between segments) of polynomial regressions.
b % of the total variation accounted by VED, CR, or GLT (partial R2).
30
Table 2.4. Estimates of intercept, regression coefficients, and coefficient of
determination (R2) of the models to predict average PEVD of each test groups with
all other test groups
On the basis of VED On the basis of CR On the basis of GLT
Intercept 1578.2396 ± 0.4255 Intercept 1966.5089 ± 1.9429 Intercept 1819.6820 ± 5.3913
VED a 0.4361 ± 0.0221 CR –480.9942 ± 23.2170 GLT –0.8373 ± 0.0540
S 1.6320 ± 0.2540 CR2 152.4630 ± 12.4006 GLT2 0.0017 ± 0.0001
S2 –0.0146 ± 0.0027 Z –123.7831 ± 16.9157 Z1 –0.0017 ± 0.0002
- - S 3.774370 ± 0.2724 Z2 –0.0001 ± 0.0000
- - S2 –0.0218 ± 0.0030 S 0.5275 ± 0.0702
- - NB/S 18.0660 ± 1.9416 NB/S 1.8331 ± 0.3681
- - (NB/S)2 0.9151 ± 0.1805 (NB/S)2 –0.0196 ± 0.0075
R2 0.55 R2 0.82 R2 0.79
R2 b 0.50 R2 b 0.73 R2 b 0.76
P < 0.0001 for all parameters.
a VED = average variance of estimated differences of each test group with all other test
groups.
S = number of sires represented in each test group.
CR = average connectedness rating of each test group with all other test groups.
NB/S = average ratio of number of bulls per sire represented in each test group.
GLT = total number of direct genetic links between each test group and all other test
groups.
Z, Z1, and Z2 = knots (junction points between segments) of polynomial regressions.
b % of the total variation accounted by VED, CR, or GLT (partial R2).
31
1560
1580
1600
1620
1640
1660
1680
1700
1720
1988 1990 1992 1994 1996 1998 2000
Year
PEV
D
200
250
300
350
400
450
500
VE
D
PEVD
VED
0
0.25
0.5
0.75
1
1.25
1.5
1988 1990 1992 1994 1996 1998 2000
Year
CR
100
200
300
400
500
600
GL
T
CR
GLT
1690
1710
1730
1750
1770
1790
1988 1990 1992 1994 1996 1998 2000
Year
PEV
D
200
230
260
290
320
350
380
410
VE
D
PEVD
VED
0
0.25
0.5
0.75
1
1.25
1.5
1.75
1988 1990 1992 1994 1996 1998 2000
Year
CR
0
200
400
600
800
1000
1200
GL
T
CR
GLT
Figure 2.1. Average degree of connectedness for pairs of test groups (top) and for each test
group with all other test groups (bottom) on the basis of PEVD, VED, CR, and GLT
32
Figure 2.2. Observed relationship of average PEVD per test group with number of bulls
per test group, average PEVD per test group with number of sires per test group, CR with
number of bulls per test group, and GLT with number of bulls per test group
33
Figure 2.3. Observed relationship of average PEVD of pairs of test groups with VED,
CR, and GLT
34
Figure 2.4. Observed relationship of average PEVD of each test group with VED, CR,
and GLT
35
1700
1750
1800
1850
1900
1950
2000
5 35 65 95 125 155 185
Number of bulls per TG
PEV
D (k
g**2
)
Disconnected
Connected
1700
1750
1800
1850
1900
1950
2000
100 200 300 400 500 600 700
VED (kg**2)
PEV
D (k
g**2
)
Disconnected
Connected
1700
1750
1800
1850
1900
1950
2000
0 0.5 1 1.5 2 2.5
CR
PEV
D (k
g**2
)
Disconnected
Connected
1700
1750
1800
1850
1900
1950
2000
0 500 1000 1500 2000 2500 3000
GLT
PEV
D (k
g**2
)
Disconnected
Connected
Figure 2.5. Observed relationship of average PEVD of each test group with number of bulls
per test group, VED, CR, and GLT for connected and disconnected test groups
36
Chapter 3
Additive, dominance, and epistatic loss effects
on pre-weaning gain in crossing of different Bos
taurus breeds
ABSTRACT - Objectives of this study were to estimate variance components, direct and
maternal breed additive, dominance, and epistatic loss effects, and additive genetic
changes for pre-weaning gain (kg). Data were from 478,466 animals from beef herds
enrolled with Beef Improvement Ontario (BIO), from 1986 to 1999, including records of
both purebred and crossbred animals from Angus, Blond D’Aquitane, Charolais,
Gelbvieh, Hereford, Limousin, Maine-Anjou, Salers, Shorthorn, and Simmental breeds.
The genetic model used in the analysis included fixed genetic effects of breed,
dominance, and epistatic loss, fixed environmental effects of age of the calf,
contemporary group, and age of the dam by sex of the calf, random additive direct and
maternal genetic effects, and random maternal permanent environment effect.
Coefficients of direct and maternal dominance effects were equal to expected direct and
maternal breed heterozygosities, respectively. Coefficients of direct and maternal epistatic
loss effects were average expected breed heterozygosities in the uniting gametes that
37
generated an individual. Variance components were estimated by REML. Genetic
changes of Angus, Charolais, Hereford, Limousin, and Simmental were obtained using
two approaches: through regression of average breeding values of purebred animals on
birth year, obtained separately for each breed, and through the within year regression of
breeding values on the contribution of each breed to the animals. Estimates of direct and
maternal additive genetic, maternal permanent environmental, and residual variances,
expressed as proportions of the phenotypic variance, were 0.32, 0.20, 0.12, and 0.52,
respectively. Annual additive genetic changes were positive for all breeds. Results from
the two approaches used to estimate genetic changes suggest that producers used animals
of substantially higher additive genetic value to produce purebred Charolais, Hereford,
and Simmental than to produce crossbred animals. Breeds ranked similarly to what was
expected, but estimates of both direct and maternal effects showed large standard errors.
Both direct and maternal dominance had a favourable effect (P < 0.05) on pre-weaning
gain, equivalent to 1.31% and 2.28% of the phenotypic mean, respectively. The same
features for direct and maternal epistatic loss effects were –2.19% (P < 0.05) and –0.08%
(P > 0.05), respectively.
Key words: beef cattle, genetic trends, heterosis, variance components.
Abbreviations: AN, Angus; BD, Blond D’Aquitane; CH, Charolais; E, epistatic loss
effect; ED, coefficient of direct epistatic loss effect; EM, coefficient of maternal epistatic
loss effect; GV, Gelbvieh; H, dominance effect; HD, coefficient of direct dominance
effect; HE, Hereford; HM, coefficient of maternal dominance effect; LM, Limousin; MA,
Maine-Anjou; SA, Salers; SH, Shorthorn; SM, Simmental.
38
INTRODUCTION
Pre-weaning gain is an economically important trait that receives considerable
attention in the multi-breed genetic evaluation of beef cattle in many countries. Both
direct and maternal effects contribute to the growth of young beef cattle. For acquiring
reliable ranking of animals in the genetic evaluation of a multi-breed population, both
additive and non-additive genetic effects have to be accounted (Arthur et al., 1999). Non-
additive effects are represented by dominance and epistatic effects, which result from
intra and inter-locus interactions, respectively. Both dominance and epistatic effects are
components of heterosis in crossbred animals. Estimates of such effects should be
obtained from the dataset used to evaluate the animals, provided that there are enough
records to generate reliable estimates.
In beef cattle improvement programs, dominance effects associated with breed
heterozygosity are generally taken into account in the estimation of breeding values of
crossbred animals. Additive-dominance models, which simultaneously estimate additive
and heterotic effects or estimate additive effect after pre-adjustment of records for
heterosis on the basis of breed heterozygosity, are standard models. These models have
been used in large beef cattle populations in Canada (Miller, 1996; Sullivan et al., 1999),
Brazil (Roso and Fries, 1998), Australia (Johnston et al., 1999), and USA (Pollak and
Quaas, 1998; Klei et al., 2002).
The justification for additive-dominance models is based on the assumption that
heterosis is mainly due to dominance effects, in agreement with results obtained in large
beef cattle crossbreeding experiments conducted at the United States Department of
Agriculture Meat Animal Research Center, Clay Center, Nebraska (Gregory et al., 1997).
According to these authors, the heterosis observed in growth traits of beef cattle is likely
39
due to dominance effects of genes and represents the recovery of accumulated inbreeding
depression within populations that have been genetically isolated from each other for
many generations. Studies of Gregory et al. (1997) suggested that retention of heterosis is
linearly proportional to heterozygosity. A similar relationship between heterosis and
heterozygosity was observed by Arthur et al. (1999) and Fries et al. (2000). In these two
later studies, however, the authors suggested that another component, the epistatic loss
effect, could be added to the additive-dominance model to provide a better explanation of
the genetic differences between animals of different breed compositions. The epistatic
loss in crossbred animals represents the effect due to the breakdown of favourable
interactions between loci existent in purebred animals, which have been built by both
natural and artificial selection within breeds (Koch et al., 1985).
Crossbreeding is a common practice in the beef industry. Because an important
objective of crossbreeding is to take advantage of breed additive and between-breed non-
additive genetic effects, analysis of additive, dominance and epistatic loss effects is
important when evaluating commercial cattle.
Objectives of this study were to estimate variance components, direct and maternal
breed additive, dominance, and epistatic loss effects, and breed additive genetic changes
for pre-weaning gain in a typical multi-breed population of beef cattle.
MATERIAL AND METHODS
Data
The data used in this study were pre-weaning weight gain of animals from beef herds
enrolled by Beef Improvement Ontario (BIO), from 1986 to 1999. The dataset after
preliminary edits consisted of 869,050 records, including records of both purebred and
40
crossbred animals. A subset of purebred and crossbred animals from the 10 most popular
breeds, including Angus (AN), Blond D’Aquitane (BD), Charolais (CH), Gelbvieh (GV),
Hereford (HE), Limousin (LM), Maine-Anjou (MA), Salers (SA), Shorthorn (SH), and
Simmental (SM), was used in the analysis. Some animals had a fraction of the breed
composition from an undetermined breed, which was treated as another breed, named
Unknown (UN). Only records of animals with complete information for calculating direct
and maternal dominance and epistatic loss coefficients (described later) were kept.
Connectedness analysis
An analysis to check for connectedness among contemporary groups (herd-year-
season-management group) across breeds was performed. The method used was the total
number of direct genetic links between contemporary groups due to common sires and
dams (GLT), which was described in Chapter 2. Contemporary groups with more than 10
calves and with at least 10 direct genetic links and two classes of direct or maternal
heterozygosities (described later) were considered connected and retained for the
analysis. The resulting dataset included 23,059 contemporary groups, 478,466 calves,
19,908 sires, and 234,608 dams. A pedigree file of 714,220 animals was used in the
analysis.
Predictor variables of fixed genetic effects
(1) Breed additive effects
Coefficients for direct and maternal breed additive effects were equal to the genetic
contribution of each breed to the breed composition of the calf and to the breed
41
composition of the dam, respectively. The estimates of direct and maternal breed additive
effects were expressed as differences relative to Angus.
Breed compositions of the animals are depicted in Figures 3.1 and 3.2. Figure 3.1
shows that less than 40% of the calves were purebred, clearly indicating that commercial
beef herds prefer crossbred to straightbred calves. Among the crossbred calves, most of
them originated from two breed crosses. On the contrary, most sires (89.3%) and dams
(61.3%) were purebred. Figure 3.2 shows that breeding practices in the commercial beef
herds studied resulted in an unbalanced number of animals among breeds. There were
substantially larger numbers of Angus, Charolais, Hereford, Limousin, and Simmental
calves, sires, and dams than Blond D’Aquitane, Gelbvieh, Maine-Anjou, Salers, and
Shorthorn. A considerable number of calves, sires, and dams (21.29, 7.87, and 15.62%,
respectively) had some portion of unknown breed in the breed composition with average
portion of unknown breed equal to 18%, 16%, and 40%, respectively. These animals were
kept in the analysis because they provided useful information for estimating other effects
considered in the genetic model.
(2) Dominance effects
Coefficients of direct (HD) and maternal (HM) dominance effects were equal to
expected direct and maternal breed heterozygosities, respectively. HD and HM were
calculated using the following equations:
HD = 1 – nb
1=iSi × Di
and
42
HM = 1 – nb
1=iMGSi × MGDi,
where nb is the number of breeds (11), and Si, Di, MGSi, and MGDi are the fractions of
the ith breed for the sire, dam, maternal grandsire, and maternal granddam breed
composition, respectively.
(3) Epistatic loss effects
For estimating epistatic loss effects, it was assumed that the parents of an individual
produce more recombinant gametes the larger their breed heterozygosities. Thus, the
coefficients for direct (ED) and maternal (EM) epistatic loss effects were calculated as the
average breed heterozygosities in uniting gametes that generated the individual (Fries et
el., 2000). Epistatic loss will be proportional to the average heterozygosity observed in
parents and will be maximum when both parents of an individual are F1s. ED and EM
were calculated as:
ED = 0.5 (HSire + HDam)
and
EM = 0.5 (HMGS + HMGD),
where HSire, HDam, HMGS, and HMGD are the expected breed heterozygosities of the sire,
dam, maternal grandsire, and maternal granddam, respectively. The average epistatic loss
due to the breakdown of all kinds of gene interactions, as deviation from the average
additive and dominance effects, will be estimated by ED and EM (Fries et. al. 2002).
Table 3.1 shows coefficients of direct and maternal dominance and epistatic genetic
effects for different mating systems involving two breeds, A and B.
43
The distribution of observations among coefficients of dominance and epistatic loss
effects is presented in Table 3.2. For ease of presentation, coefficients of dominance and
epistatic loss effects were grouped in classes of 0.125, ranging from zero to one. Numbers
in Table 3.2 suggest that there was a better distribution of observations among classes of
coefficients of direct and maternal dominance than among classes of coefficients of direct
and maternal epistatic loss effects. Because only approximately 10% of the sires are
crossbred (Figure 3.1), there were relatively few observations in the classes of
coefficients of epistatic loss effects larger than 0.625. The mean and standard deviation of
pre-weaning gain and predictor variables considered in the analysis are presented in Table
3.3.
Genetic analysis
The genetic model for pre-weaning gain, defined in matrix notation, was:
y = Xb + Fv + Za + Wm + Sp + e,
where
y = vector of observations;
b = vector of fixed genetic effects. This vector included direct and maternal breed
additive, dominance, and epistatic loss effects;
v = vector of fixed environmental effects. This vector included age of the calf as a
covariate (linear and quadratic effects), and age of the dam by sex of the calf and
contemporary group (herd-year-season-management group) as classification variables;
a = vector of random direct additive genetic effects;
m = vector of random maternal additive genetic effects;
p = vector of random maternal permanent environment, and
44
e = vector of random residual effects.
X, F, Z, W, and S are incidence matrices relating records to fixed genetic, fixed
environmental, direct genetic, maternal genetic, and permanent environment effects,
respectively.
The vectors of random effects a, m, p, and e were assumed to have (co)variance
matrices equal to A�a2, A�m
2, I�p2, and I�e
2, respectively, where A is the additive
numerator relationship matrix among animals and I is an identity matrix. Covariance
between a and m was assumed equal to A�am. Homogeneity of variances and the same
dominance and epistatic loss effects for crosses of different pairs of breeds, and no
interactions between genetic and environmental effects were assumed.
Estimates of (co)variance components (�a2, �m
2, �p2, �e
2, and �am) and estimates of the
effects included in the model were obtained using the DMU program (Madsen and
Jensen, 2000). First, (co)variance components were estimated by the restricted maximum
likelihood method, using a data subset containing 300,002 records from randomly
sampled herds, which overcame computational limitations. Given the estimated
(co)variance components, the estimates of the effects in the model were obtained using
the complete dataset.
Multi-breed additive genetic changes
To estimate genetic changes for the breeds with the largest number of records (Angus,
Charolais, Hereford, Limousin, and Simmental), two different approaches were used:
(1) Regression of average estimated breeding values of purebred calves on birth year,
computed separately for each breed, and
45
(2) Regression of estimated breeding values on contribution of each breed to the breed
composition of the calves in a given birth year (Klei et al., 2002; Elzo et al., 2004).
The regression approach (2) for calculating the yearly means of each breed used
information of both purebred and crossbred animals. Thus, the regression coefficient
obtained for each breed, in every year, accounted for additive genetic changes due to
alleles coming from both purebred and crossbred animals. Differences between breed
regression coefficients and yearly average estimated breeding values of purebred animals
were calculated to determine the genetic contribution of crossbred animals.
The number of purebred and crossbred (expressed as equivalent to purebred) calves
contributing to genetic changes of each breed is presented in Figure 3.3. To express the
number of crossbred calves as equivalent number of purebred calves, breed portions that
differed from 1.0 in the breed composition were added over all calves. Comparison
among breeds presented in Figure 3.3 reveals that, proportionally, there were more alleles
of breeds Charolais, Limousin, and Simmental in crossbred animals than Angus and
Hereford.
RESULTS
(Co)variance components
Estimates of direct additive genetic variance (�a2), maternal additive genetic variance
(�m2), maternal permanent environmental variance (�p
2), residual variance (�e2), and
direct by maternal additive genetic covariance (�am) of pre-weaning gain are presented in
Table 3.4. For ease of interpretation variances were expressed as proportions of
phenotypic variance (�t2), where �t
2 = �a2 + �m
2 + �am + �p2 + �e
2. Thus, ha2 = �a
2 / �t2,
46
hm2 = �m
2 / �t2, p2 = �p
2 / �t2, and e2 = �e
2 / �t2. The correlation between direct and
maternal genetic effects was calculated by ram = �am / (�a �m).
Multi-breed additive genetic changes
The yearly average estimated breeding values of purebred animals and the regression
of yearly estimated breeding values on contribution of each breed to the breed
composition of the animals for breeds Angus, Hereford, Charolais, Limousin, and
Simmental are depicted in Figure 3.4. The additive genetic changes in pre-weaning gain
per year are presented in Table 3.5.
All breeds showed positive additive genetic changes (P < 0.01). Estimates of genetic
changes obtained by the regression and by the average method should be similar if the
sample of breed alleles coming from purebred animals has similar additive genetic value
to the sample of breed alleles coming from crossbred animals. Average estimated
breeding values and regression coefficients for Angus were similar from 1986 to 1993. In
the last three years, regression coefficients were larger than average estimated breeding
values. Additive genetic changes of Charolais, Hereford, and Simmental had pattern
similar to one another, showing average estimated breeding values larger than regression
coefficients. These results suggest that alleles coming from purebred Charolais, Hereford,
and Simmental have higher additive genetic values than alleles coming from crossbred
animals. With regard to Limousin, average estimated breeding values were larger than
regression coefficients for most years, but there was no clear trend pointing out
differences between the two approaches.
To determine the influence of the correlation between direct and maternal genetic
effects on estimates of breed additive genetic changes, an additional analysis assuming a
47
zero correlation between direct and maternal genetic effects, based on national
recommendations for Canada (AAFC, 1993), was performed. Additive genetic changes
per year did not greatly differ from those presented in Table 3.5, where a correlation
–0.63 between direct and maternal genetic effects was used. Differences in the genetic
changes per year (kg) were equal to or lower than 0.03% for all breeds.
Dominance and epistatic loss effects
Estimates of direct and maternal dominance and epistatic loss effects on pre-weaning
gain associated with breed heterozygosity are presented in Table 3.6. The magnitude of
both dominance and epistatic loss effects were low. Expressed relative to the phenotypic
mean, direct and maternal dominance effects had a positive effect (P < 0.05) of 1.31%
and 2.28%, respectively. Direct and maternal epistatic loss effects were equal to –2.19%
(P < 0.05) and –0.08% (P > 0.05), respectively.
Breed additive effects
Estimates of direct and maternal breed additive effects on pre-weaning gain,
expressed as deviations from Angus, are presented in Table 3.7. Estimates of direct breed
additive effects of Hereford, Limousin, and Shorthorn were lower than estimates of
Angus. Salers slightly exceeded Angus (0.60 kg), while Charolais, Gelbvieh, Maine-
Anjou, and Simmental exceeded Angus by more than 10 kg for direct effects.
Estimates of maternal breed additive effects of Blond D’Aquitane, Charolais, and
Hereford were lower than Angus. Limousin and Maine-Anjou exceeded Angus by less
than one kg. Gelbvieh, Salers, Shorthorn, and Simmental exceeded Angus by more than
4.5 kg.
48
The standard errors of the estimates of both direct and maternal breed additive effects
were large for all the breeds and greater for those breeds represented by small number of
calves (Blond D’Aquitane, Gelbvieh, Maine-Anjou, Salers, and Shorthorn).
Sampling correlations
The dataset used in this investigation came from commercial herds and, therefore, was
not designed to estimate breed additive, dominance, and epistatic loss effects.
Cunningham and Connolly (1989) showed that high correlation between estimates might
jeopardize the precision of estimation of genetic effects. Even estimable functions may be
highly confounded.
For obtaining information with regard to degree of confounding between estimates,
sampling correlations among additive, dominance, and epistatic loss effects were
calculated (Table 3.8). The sample correlation between maternal dominance and direct
epistatic loss effects was very high, indicating that it was very difficult to separate the
unique effect of each of these two genetic effects. Sample correlations between breeds
were generally high. Sample correlations of direct breed additive effects with maternal
breed additive effects within the same breed were greater than sampling correlations
between different breeds. Thus it was generally more difficult to separate direct and
maternal additive genetic effects within breeds than between breeds.
DISCUSSION
Estimates of ha2, hm
2, p2, and e2 obtained in this study were compared to estimates
from previous studies of Miller (1996) in Ontario and with average results reported by
Koots et al. (1994a) in a review of a large number of published estimates of genetic
49
parameters. Estimates of ha2, hm
2, p2, and e2 were equal to 0.32, 0.20, 0.12, and 0.52,
respectively. Estimates of ha2 and hm
2 were in line with pooled estimates of 0.27 and 0.23,
respectively, reported by Koots et al. (1994a). The estimate of ha2 also did not greatly
differ from Sullivan et al. (1999), where a ha2 equal to 0.30 was used in the estimation of
genetic trends and mean genetic differences among breeds in Ontario. Estimates of ha2,
hm2, and p2 were lower than estimates of 0.44, 0.25 and 0.15, respectively, obtained by
Miller (1996). Differences between estimates of (co)variance components of this study
and estimates of Miller (1996) are likely due to differences in the datasets and models
used. The dataset used by Miller (1996) was considerably smaller (75,365 records) and
direct and maternal epistatic loss effects were not fitted. Additional analyses dropping
epistatic loss effects, however, suggest that differences in the datasets were the main
cause of differences between estimates of variance components of the present study and
the study of Miller (1996).
The correlation between direct and maternal genetic effects on pre-weaning gain
(ram = –0.63), although lower in absolute value than the estimate of –0.77 obtained by
Miller (1996), was still strongly negative. This result is in marked contrast with average
estimate of –0.25 reported by Koots et al. (1994b), and greater than estimates of Meyer
(1992) and Robinson (1996), where average values of –0.59 and –0.47 were reported. A
possible cause contributing to the strong negative genetic correlation is the small
proportion of female calves with records that later had their own progeny. In the dataset
there were only 23,508 cases where a female calf later become a cow, corresponding to
approximately 10% of all female calves.
The two approaches used to estimate additive genetic changes of Angus, Charolais,
Hereford, Limousin, and Simmental indicated positive annual changes in these breeds.
50
Average genetic changes per year, estimated by both approaches, were lower than 0.20%
of the phenotypic mean for all breeds. Smith (1984), assuming a single trait selection,
calculated theoretically possible rate of genetic change per year in beef cattle growth
traits of 1.4% of the mean. The same author reported that rates of genetic changes of 0.7
and 0.3% per year were achieved in long-term selection experiments and breeding
programs in practice, respectively. Selection practices in Ontario are based on multiple-
trait selection and, therefore, lower genetic changes in individual traits than under single
trait selection are expected. The genetic improvement, however, will be balanced over
several traits, improving the overall economic merit.
The regression approach for calculating yearly genetic means for each breed allowed
the contribution of crossbred and purebred animals in the population to be included,
making full use of all available information when calculating additive genetic changes.
Comparison of results obtained by the regression approach with the traditional average
estimated breeding values of purebred animals revealed that producers used animals of
substantially higher additive genetic value to produce purebred Charolais, Hereford, and
Simmental than to produce crossbred animals, which could reflect different selection
goals, population sizes, and sire availability per breed. The additive genetic value of
Angus and Limousin for producing purebred or crossbred animals tended to be similar.
Both direct and maternal dominance effects showed a favourable effect on pre-
weaning gain. Direct and maternal epistatic loss effects had the anticipated negative
effect. However, for dominance and epistatic loss effects, the magnitude of the estimates
was small. Epistatic effects on a specific trait may be either favorable or unfavorable,
depending on selection history of the population and genetic correlations among traits.
Favorable epistatic effects may result from direct selection for a particular trait, while
51
unfavorable effects may result from correlated response of traits with antagonistic genetic
correlation (Cassady et al., 2002). Direct and maternal dominance were equivalent to
1.31% and 2.28% of the phenotypic average. Direct and maternal epistatic loss effects
were equivalent to –2.19% and –0.08%, respectively. The maternal epistatic loss effect
was statistically not different from zero, probably reflecting the deficiency in the structure
of data to estimate this genetic effect, as shown in Table 3.2. To detect a significant
effect, a larger proportion of crossbred sires might be required. The small proportion of
crossbred sires (and grandsires) in the dataset has two consequences. Firstly, it reduces
the expression of epistatic loss because at least one allele at each locus will be from a
parental breed in a large proportion of crossbred progeny, which reduces the breakdown
of favorable interactions established in the pure breeds (Kinghorn, 1983). Secondly, it
increases the dependence between dominance and epistatic effects, causing collinearity
between these two genetic effects. Additional analyses, assuming a zero covariance
between direct and maternal genetic effects (not reported), resulted in estimates of direct
and maternal dominance and epistatic loss effects similar to those obtained when a non-
zero covariance was used.
According to results obtained by Gregory et al. (1997) in a large beef cattle
crossbreeding experiment, the heterosis observed in growth traits of beef cattle is likely
due to dominance effects. This observation allows fitting heterosis as being proportional
to the probability that alleles at a locus come from different breeds, which is equal to the
breed heterozygosity. Further analysis fitting only dominance effects in the model
(excluding epistatic loss effects) resulted in estimates of direct and maternal dominance of
1.31% and 1.84%, respectively. Therefore, estimates of dominance effects from both
models did not greatly differ. These results were also in close agreement with Miller
52
(1996), who reported estimates of direct and maternal heterosis of 1.34% and 2.28%,
respectively, assuming a dominance model. In the multi-breed genetic evaluation
currently run in Ontario, records are pre-adjusted for heterosis on the basis of
heterozygosity. For pre-weaning growth traits, direct and maternal heterosis of 5% are
assumed for an individual with heterozygosity of 100%, regardless of the breeds involved
(Sullivan et al., 1999).
Koch et al. (1985) evaluated dominance and epistatic loss effects on weaning gain of
Angus × Hereford crosses. In their study, direct dominance and epistatic loss effects were
not significant despite the relatively large negative values. They stated that a larger
dataset and a more complete array of mating types would be needed to attain statistically
significant results. In a review of a large number of experimental results including beef
cattle, dairy cattle, pigs, poultry, and sheep, Sheridan (1981) found that, in many cases,
the level of heterosis in crossbreeding populations other than the F1 was substantially
below expectation on the basis of heterozygosities. The conclusion of the review of
Sheridan (1981) was that, based on the performance of purebred and F1 populations, it
was not possible to predict the level of heterosis in other various genotypes, suggesting
the presence of epistatic effects. According to Cunningham (1987), although in some
cases epistatic loss effects can be safely neglected, their proper evaluation is one of the
unsolved problems of animal breeding research. Recent studies have reported epistatic
loss on pre-weaning gain in crosses between Bos taurus and Bos indicus (Fries et al.,
2000; Piccoli et al., 2002; Demeke et al., 2003; Pimentel et al., 2003; Cardoso, 2004).
Because Bos taurus and Bos indicus have greater genetic distance (larger potential
differences in gene frequencies), Bos taurus x Bos indicus crosses generally express a
53
higher level of heterosis in comparison to crosses between Bos taurus breeds. As a
consequence, greater epistatic loss is expected in their crosses.
Standard errors of maternal dominance and direct epistatic loss effects were large, in
comparison to standard errors of direct dominance and maternal epistatic loss effects.
Estimates of maternal dominance and direct epistatic loss effects were of comparable
magnitude, although opposite in sign. The sampling correlation between estimates was
very high, likely due to a structural deficiency of the data to separate maternal dominance
and epistatic loss effects and/or due to linear dependencies (multicollinearity) involving
predictor variables of maternal dominance (HM) and direct epistatic loss (ED) effects.
Estimates of breed additive effects were in general agreement with what were
expected based on previous studies of Miller (1996). Further analysis assuming a zero
genetic covariance between direct and maternal genetic effects resulted in small changes
in the estimates of breed effects and no changes in the rank of the breeds (not reported).
Standard errors of the estimates of breed effects and sampling correlations between
estimates, particularly between direct and maternal breed effects, were high. These results
could be a symptom of lack of enough information to estimate both direct and maternal
breed additive effects and/or multicollinearity among corresponding predictor variables.
With a high degree of multicollinearity, estimates of regression coefficients obtained by
ordinary least square methods typically have large standard errors, indicating that they
could be highly confounded. In addition, a high degree of multicollinearity would result
in breed estimates that are sensitive to changes in the dataset.
Because estimates of breed effects comprise part of the across-breed estimated
breeding values (ABC) used as selection criteria for across breed comparisons in the beef
industry, lack of enough information in the data to adequately separate breed additive
54
effects and/or multicollinearity among predictor variables of breed effects may result in
less reliable ranking of the animals.
CONCLUSIONS
Estimates of (co)variance components of pre-weaning gain of beef cattle did not
greatly differ from previous studies in Ontario. The large estimated negative genetic
correlation between direct and maternal effects seems to be more likely a consequence of
lack of enough information in the dataset to separate these effects than an indication of a
true negative relationship.
Annual additive genetic changes were positive for all major breeds evaluated
(Angus, Charolais, Hereford, Limousin, and Simmental). The traditional approach for
estimating genetic changes based on information from purebred calves and the alternative
approach based on information from both purebred and crossbred calves revealed
differences in selection practices among breeds. Producers used animals of substantially
higher additive genetic values to produce purebred Charolais, Hereford, and Simmental
than to produce crossbred animals. Producers of Angus and Limousin used animals of
similar genetic values to produce both purebred and crossbred animals.
Both direct and maternal dominance effects caused a favourable effect on pre-
weaning gain. Direct epistatic loss reduced the performance of the animals, whereas
maternal epistatic loss did not significantly affect the pre-weaning gain.
Results from this study accumulated more evidence that the level of direct and
maternal heterosis on pre-weaning gain in Ontario is lower than 5%, which indicates that
this assumed level should be reviewed in the multi-breed genetic evaluation.
55
Breeds ranked similarly from what was expected, but estimates were highly unstable,
showing high standard errors. Further investigation to detect the causes of instability and
application of alternative statistical methods are warranted.
56
Table 3.1. Coefficients of direct (HD) and maternal (HM) dominance and direct (ED)
and maternal (EM) epistatic loss genetic effects for different mating systems
involving two breeds, A and B
Sire Dam fA a HD HM ED EM
Parental
A A 1 0 0 0 0
B B 0 0 0 0 0
F1
A B ½ 1 0 0 0
B A ½ 1 0 0 0
Backcrosses
A AB ¾ ½ 1 ½ 0
B AB ¼ ½ 1 ½ 0
AB B ¼ ½ 0 ½ 0
AB A ¾ ½ 0 ½ 0
Advanced generations
F1 F1 ½ ½ 1 1 0
F2 F2 ½ ½ ½ ½ 1
F3 F3 ½ ½ ½ ½ ½ a Fraction of breed A in the breed composition of the animal
57
Table 3.2. Distribution of observations among coefficients of direct (HD) and
maternal (HM) dominance and direct (ED) and maternal (EM) epistatic loss genetic
effects
Class a HD HM ED EM
0.000 184,115 285,368 261,239 413,347
0.125 18,825 9,159 32,202 10,324
0.250 15,414 9,476 29,755 11,710
0.375 3,322 759 10,764 1,563
0.500 62,701 27,320 119,984 33,332
0.625 8,622 5,568 15,871 6,137
0.750 6,722 2,193 4,184 1,647
0.875 2,630 352 434 10
1.000 176,115 138,271 4,033 396
a Coefficients of dominance and epistatic loss effects were grouped in classes of 0.125,
ranging from zero to one. Every class included fractions equal or smaller than the
mentioned class.
58
Table 3.3. Mean and standard deviation (SD) of pre-weaning gain (PWG), weaning
age (Age), coefficients of direct and maternal breed additive, dominance (HD and
HM), and epistatic loss (ED and EM) genetic effects
Trait a Mean ± SD
PWG (kg) 203.47 ± 49.41
Age (days) 203.95 ± 30.87
HD 0 .47 ± 0.44
HM 0.34 ± 0.45
ED 0.19 ± 0.24
EM 0.05 ± 0.15
Direct Maternal
Breed Mean ± SD b Calves c Mean ± SD Dams d
Angus 0.59 ± 33 77,324 0.74 ± 29 62,612
Blond D’Aquitane 0.61 ± 28 15,469 0.79 ± 25 8,143
Charolais 0.58 ± 27 163,148 0.71 ± 27 109,477
Gelbvieh 0.58 ± 26 3,750 0.73 ± 25 1,807
Hereford 0.56 ± 33 235,681 0.75 ± 29 225,368
Limousin 0.60 ± 24 115,137 0.72 ± 25 59,127
Maine-Anjou 0.54 ± 27 8,769 0.70 ± 28 5,818
Salers 0.70 ± 28 3,051 0.81 ± 24 8,830
Shorthorn 0.42 ± 30 30,503 0.64 ± 30 28,722
Simmental 0.58 ± 29 140,667 0.69 ± 26 107,581
Unknown 0.18 ± 14 101,902 0.40 ± 25 75,676
a Coefficients of direct and maternal breed additive, dominance, and epistatic loss genetic
effects range from zero to one. b Only animals containing some portion of the indicated breed were included for
calculating mean and standard deviation. c Number of calves containing some portion of the indicated breed. d Number of dams containing some portion of the indicated breed.
59
Table 3.4. Estimates of (co)variance components and genetic parameters of pre-
weaning gain (kg)
(Co)variance component a Estimate Parameter b Estimate
�a2 254.47 ± 8.54 ha
2 0.32
�m2 161.17 ± 9.28 hm
2 0.20
�p2 94.12 ± 5.38 p2 0.12
�e2 408.18 ± 4.78 e2 0.52
�am –128.61 ± 7.76 ram –0.63 ± 0.02
a �a2 = direct additive genetic variance.
�m2 = maternal additive genetic variance.
�p2 = maternal permanent environmental variance.
�e2 = residual variance.
�am = direct by maternal additive genetic covariance.
b Variance component as a proportion of phenotypic variance (�t2).
�t2 = �a
2 + �m2 + �am + �p
2 + �e2
For �am, genetic correlation is shown.
60
Table 3.5. Multi-breed additive genetic changes in pre-weaning gain per year
obtained through regression of average estimated breeding values of purebred
calves on birth year (Average) and through regression of estimated breeding values
on contribution of each breed to the breed composition of the calves (Regression)
Breed Average (kg) Regression (kg)
Angus 0.28 ± 0.05
(0.14%) a
0.28 ± 0.04
(0.14%)
Charolais 0.22 ± 0.02
(0.11%)
0.17 ± 0.02
(0.08%)
Hereford 0.35 ± 0.03
(0.17%)
0.27 ± 0.02
(0.13%)
Limousin 0.10 ± 0.03
(0.05%)
0.12 ± 0.02
(0.06%)
Simmental 0.27 ± 0.03
(0.13%)
0.25 ± 0.02
(0.12%)
a Values between parentheses were expressed as percentages of the overall phenotypic
average.
61
Table 3.6. Estimates and standard errors of direct and maternal dominance (H) and
epistatic loss (E) effects on pre-weaning gain (kg)
Direct Maternal
H 2.67 ± 0.20
(1.31%) a
4.64 ± 0.83
(2.28%)
E –4.45 ± 1.63
(–2.19%)
–0.16 ± 0.39
(–0.08%)
a Values between parentheses are expressed relative to the overall phenotypic average.
62
Table 3.7. Estimates (as deviations from Angus) and standard errors of direct and
maternal breed additive effects for pre-weaning gain (kg)
Breed Direct Maternal
Blond D’Aquitane 5.34 ± 4.10 –5.68 ± 2.32
Charolais 13.19 ± 3.61 –2.88 ± 1.86
Gelbvieh 10.41 ± 4.69 7.91 ± 3.12
Hereford –6.26 ± 3.64 –3.21 ± 1.86
Limousin –3.07 ± 3.66 0.61 ± 1.90
Maine-Anjou 12.29 ± 4.59 0.25 ± 2.52
Salers 0.60 ± 4.27 7.55 ± 2.49
Shorthorn –9.30 ± 4.29 4.57 ± 2.23
Simmental 14.19 ± 3.64 5.33 ± 1.87
63
Table 3.8. Sampling correlations among estimates of direct (D) and maternal (M)
fixed genetic effects
HD a ED AND BDD CHD GVD HED LMD MAD SAD SHD SMD
HD 1.00
ED –0.01 1.00
AND 0.00 0.32 1.00
BDD –0.01 0.28 0.84 1.00
CHD 0.01 0.32 0.94 0.86 1.00
GVD –0.01 0.24 0.73 0.67 0.74 1.00
HED 0.04 0.31 0.93 0.84 0.95 0.72 1.00
LMD –0.00 0.31 0.93 0.85 0.96 0.73 0.94 1.00
MAD 0.00 0.23 0.74 0.67 0.75 0.58 0.74 0.75 1.00
SAD –0.01 0.26 0.79 0.71 0.80 0.61 0.79 0.79 0.64 1.00
SHD 0.04 0.25 0.77 0.70 0.79 0.60 0.78 0.78 0.64 0.66 1.00
SMD 0.02 0.32 0.94 0.85 0.96 0.73 0.94 0.95 0.75 0.79 0.78 1.00
HM –0.01 –0.98 –0.31 –0.28 –0.32 –0.24 –0.31 –0.31 –0.23 –0.25 –0.24 –0.31
EM –0.01 –0.00 –0.01 –0.01 –0.01 –0.00 –0.01 –0.01 –0.00 –0.00 –0.00 –0.01
ANM 0.01 –0.30 –0.96 –0.81 –0.91 –0.70 –0.89 –0.90 –0.71 –0.76 –0.74 –0.90
BDM 0.06 –0.24 –0.74 –0.89 –0.76 –0.59 –0.74 –0.75 –0.59 –0.63 –0.61 –0.75
CHM 0.03 –0.31 –0.92 –0.83 –0.97 –0.72 –0.92 –0.93 –0.73 –0.78 –0.76 –0.93
GVM 0.05 –0.18 –0.55 –0.50 –0.55 –0.77 –0.54 –0.55 –0.44 –0.46 –0.45 –0.55
HEM –0.03 –0.30 –0.91 –0.82 –0.93 –0.70 –0.97 –0.92 –0.72 –0.77 –0.77 –0.92
LMM 0.07 –0.30 –0.90 –0.82 –0.92 –0.71 –0.90 –0.96 –0.72 –0.76 –0.75 –0.91
MAM 0.00 –0.21 –0.67 –0.61 –0.69 –0.53 –0.67 –0.68 –0.91 –0.58 –0.58 –0.68
SAM 0.05 –0.22 –0.67 –0.61 –0.68 –0.53 –0.67 –0.68 –0.54 –0.86 –0.56 –0.68
SHM –0.04 –0.23 –0.74 –0.67 –0.76 –0.58 –0.75 –0.75 –0.61 –0.63 –0.96 –0.75
SMM 0.01 –0.31 –0.91 –0.83 –0.93 –0.71 –0.91 –0.92 –0.73 –0.77 –0.76 –0.97
64
Table 3.8. Continuation …
HM EM ANM BDM CHM GVM HEM LMM MAM SAM SHM SMM
HM 1.00
EM 0.00 1.00
ANM 0.30 0.01 1.00
BDM 0.25 0.00 0.74 1.00
CHM 0.32 0.01 0.92 0.77 1.00
GVM 0.19 0.00 0.55 0.47 0.56 1.00
HEM 0.31 0.02 0.91 0.75 0.94 0.55 1.00
LMM 0.30 0.00 0.90 0.76 0.94 0.56 0.92 1.00
MAM 0.21 0.01 0.68 0.56 0.69 0.41 0.68 0.69 1.00
SAM 0.22 0.00 0.67 0.56 0.69 0.41 0.68 0.68 0.52 1.00
SHM 0.24 0.01 0.74 0.61 0.77 0.45 0.77 0.75 0.58 0.56 1.00
SMM 0.31 0.01 0.91 0.76 0.94 0.56 0.93 0.92 0.69 0.69 0.76 1.00
a H = dominance, E = epistatic loss, AN = Angus, BD = Blond D’Aquitane,
CH = Charolais, GV = Gelbvieh, HE = Hereford, LM = Limousin, MA = Maine-Anjou,
SA = Salers, SH = Shorthorn, and SM = Simmental.
65
0
10
20
30
40
50
60
70
80
90
Calves Sires Dams
Perc
enta
ge 1 breed2 breeds3 breeds4 breeds
Figure 3.1. Percentage of calves, sires, and dams with 1, 2, 3, or 4 breeds in the genetic
composition in the dataset containing 478,466 calves, 19,908 sires, and 234,608 dams
66
020,00040,00060,00080,000
100,000120,000140,000160,000180,000
AN BD CH GV HE LM MA SA SH SM UN
Breed
Num
ber
of c
alve
sPurebred
Crossbred
0500
1,0001,5002,0002,5003,0003,5004,0004,500
AN BD CH GV HE LM MA SA SH SM UN
Breed
Num
ber
of si
res
Purebred
Crossbred
0
10,000
20,000
30,000
40,000
50,000
60,000
70,000
AN BD CH GV HE LM MA SA SH SM UN
Breed
Num
ber
of d
ams
Purebred
Crossbred
Figure 3.2. Number of purebred and crossbred calves, sires, and dams containing some portion of the indicated breed in dataset including 478,466 calves, 19,908 sires, and 234,608 dams
67
0
10,000
20,000
30,000
40,000
50,000
60,000
70,000
80,000
AN BD CH GV HE LM MA SA SH SM UN
Breed
Num
ber
of c
alve
sPurebredCrossbred
Figure 3.3. Numbers of purebred and crossbred (expressed as equivalent to purebred)
calves per breed
68
Angus
-1
0
1
2
3
4
1986 1988 1990 1992 1994 1996 1998 2000
Birth year
Est
imat
ed b
reed
ing
valu
e (k
g)
Average
Regression
Charolais
-1
0
1
2
3
4
1986 1988 1990 1992 1994 1996 1998 2000
Birth year
Est
imat
ed b
reed
ing
valu
e (k
g)
Average
Regression
Hereford
-1
0
1
2
3
4
1986 1988 1990 1992 1994 1996 1998 2000
Birth year
Est
imat
ed b
reed
ing
valu
e (k
g)
Average
Regression
Limousin
-1
0
1
2
3
4
1986 1988 1990 1992 1994 1996 1998 2000
Birth year
Est
imat
ed b
reed
ing
valu
e (k
g)Average
Regression
Simmental
-1
0
1
2
3
4
5
1986 1988 1990 1992 1994 1996 1998 2000
Birth year
Est
imat
ed b
reed
ing
valu
e (k
g)
Average
Regression
Figure 3.4. Multi-breed additive genetic changes in pre-weaning gain obtained through
average estimated breeding values of purebred calves per birth year (Average) and through
regression of yearly estimated breeding values on contribution of each breed to the breed
composition of the calves (Regression)
69
Chapter 4
Estimation of genetic effects in the presence of
multicollinearity
ABSTRACT - A framework using generalized ridge regression methods was developed
for obtaining stable estimates of direct and maternal breed additive, dominance, and
epistatic loss effects when multicollinearity among predictor variables is of concern. Pre-
weaning gain of calves recorded through Beef Improvement Ontario (BIO), from 1986 to
1999, were analyzed. The genetic model included fixed genetic effects of breed,
dominance, and epistatic loss, fixed environmental effects of age of the calf,
contemporary group, and age of the dam by sex of the calf, random additive direct and
maternal genetic effects, and random maternal permanent environment effect. The degree
and the nature of the multicollinearity among predictor variables of breed additive,
dominance, and epistatic loss effects were identified and ridge regression methods were
used as an alternative to ordinary least squares (LS). Ridge parameters were obtained
using two different objective methods: Generalized ridge estimator of Hoerl and Kennard
(R1) and bootstrap in combination with cross-validation (R2). Estimates from R1 and R2
were compared to estimates from LS on the basis of mean squared error of predictions
(MSEP) and variance inflation factors (VIF), computed over one hundred bootstrap
samples. Both ridge regression methods outperformed the LS estimator with respect to
70
MSEP and VIF. MSEP of R1 and R2 were similar, which were 3% lower than the MSEP
of LS. Average VIF of LS, R1, and R2 were equal to 26.81, 6.10, and 4.18, respectively.
Ridge regression methods were particularly effective in reducing the multicollinearity
involving predictor variables of breed additive effects. Due to a high degree of
confounding between estimates of maternal dominance and direct epistatic loss effects it
was not possible to compare the relative importance of effects with a high level of
confidence. The inclusion of epistatic loss effects in the additive-dominance model did
not cause noticeable re-ranking of sires, dams, and calves based on across-breed
estimated breeding values. More stable estimates of breed effects as a result of this study
will contribute to more accurate across-breed estimated breeding values, which is an
important criterion of selection in beef cattle.
Key words: additive, bootstrap, crossbreeding, dominance, epistatic loss, genetic
evaluation, non-additive, ridge regression.
Abbreviations: ABC, across-breed estimated breeding value; CI, condition index; D,
dominance effect; E, epistatic loss effect; EBV, estimated breeding value; ED, coefficient
of direct epistatic loss effect; EM, coefficient of maternal epistatic loss effect; HD,
coefficient of direct dominance effect; HM, coefficient of maternal dominance effect;
MSE, mean square error; MSEP, mean squared error of prediction; VIF, variance
inflation factor.
71
INTRODUCTION
Breed additive, dominance, and epistatic loss effects are of concern in the genetic
evaluation of a multi-breed population. For estimating these effects, a multiple regression
equation including predictor variables such as breed composition of the calf and of the
dam, expected direct and maternal breed heterozygosities, and functions of the
heterozygosities can be used (Koch et al. 1985; Pimentel et al., 2004). Interpretation of
the estimates depends on the assumption that the predictor variables are not strongly
interrelated.
If there are strong linear relationships among predictor variables, the interpretation of
the corresponding estimates may not be valid because it is difficult to estimate the unique
effect of an individual variable in the regression equation. Typically, when strong linear
relationships exist, the regression coefficients have large standard errors, may have signs
that are opposite than would be expected, and are sensitive to changes in the data file and
to addition or deletion of variables in the model, making modeling very confusing
(Belsley, 1991). In addition, when considered in combination, the estimated coefficients
often cancel out. This problem is known as collinearity or multicollinearity (Weisberg,
1985). In the presence of multicollinearity, least squares estimates are not adequate
because they are unstable.
Various alternatives to least squares have been suggested to deal with
multicollinearity problems. One such alternative is the ridge regression, which was
introduced by Hoerl (1962) and Hoerl and Kennard (1970a, 1970b). The ridge estimators
are biased, but might be useful in providing estimates that are more precise and, therefore,
more stable than least squares estimates when multicollinearity is of concern.
The ridge estimator is obtained by solving the system of equations
72
(X�X + kI) kb = X�y
to give
kb = (X�X + kI)–1 X�y,
where k is the ridge parameter or perturbation constant, with k > 0, and I is an identity
matrix. In a generalized form, kI is replaced by a matrix K, where K = diag (k1 k2 … kp),
with ki � 0. Many methods have been proposed in the literature for selecting appropriate
ki values (Gruber, 1998), but there is no consensus of which method is the most adequate.
In general, the best method to estimate an optimal K depends on the data and model used.
From a Bayesian viewpoint (Goldstein and Smith, 1974; Draper and Smith, 1998;
Sorensen and Gianola, 2002), the ridge regression can be considered as an estimate of b
from the data subject to prior knowledge about the parameter, which is supplemented by
the ridge parameter k. Given that k = �2/ 2b� , where �2 is the residual variance and 2
b� is a
measure of the spread of the elements of b, large values of k imply an a priori belief that
more restricted values of b are more likely than larger values, while small values of k
imply an a priori belief that quite large range of values of the b are not unreasonable.
The objective of this study was to develop a framework using ridge regression
methods for obtaining stable estimates of direct and maternal breed additive, dominance,
and epistatic loss genetic effects in the presence of multicollinearity.
MATERIAL AND METHODS
Data
The data were pre-weaning weight gain of animals from beef herds enrolled with
Beef Improvement Ontario (BIO), from 1986 to 1999. The dataset after preliminary edits
73
consisted of 869,050 records, including records of both purebred and crossbred animals.
A subset including purebred and crossbred animals from the 10 most common breeds
(Angus, Blond D’Aquitane, Charolais, Guelbvieh, Hereford, Limousin, Maine-Anjou,
Salers, Shorthorn, and Simmental), containing 478,466 records was chosen for the
analysis. Portions of undetermined breed in the breed composition were treated as another
breed, named Unknown (UN). A summary of the data is presented in Chapter 3.
Predictor variables of fixed genetic effects
(1) Breed additive effects
Coefficients of direct and maternal breed additive effects were equal to the genetic
contribution of each breed to the breed composition of the calf and to the breed
composition of the dam, respectively. Linear dependencies among breed additive effects
required a restriction to obtain these estimates. The estimates for direct and maternal
breed additive effects were reported relative to the Angus breed.
(2) Dominance effects
Coefficients of direct dominance (HD) and maternal dominance (HM) genetic effects
were equal to expected direct and maternal breed heterozygosities, respectively. HD and
HM were calculated using the following equations:
HD = 1 – nb
1=iSi × Di
and
HM = 1 – nb
1=iMGSi × MGDi,
74
where nb is the number of breeds (11), and Si, Di, MGSi, and MGDi are the fractions of
the ith breed for the sire, dam, maternal grandsire, and maternal granddam breed
composition, respectively.
(3) Epistatic loss effects
For estimating epistatic loss effects, the parents of an individual were assumed to
produce more recombinant gametes the larger their breed heterozygosities. Thus, the
coefficients of epistatic loss effects for direct (ED) and maternal (EM) effects were
calculated as the average breed heterozygosities in uniting gametes that generated the
individual (Fries et el., 2000). ED and EM were calculated as:
ED = 0.5 (HSire + HDam)
and
EM = 0.5 (HMGS + HMGD),
where HSire, HDam, HMGS, and HMGD are the expected breed heterozygosities of the sire,
dam, maternal grandsire, and maternal granddam, respectively.
Multicollinearity diagnostics
For identifying possible linear dependencies between the covariates included in the
analysis, different measures of the degree of multicollinearity were obtained.
(1) Variance inflation factor
The variance inflation factor is the most popular measure of multicollinearity. If Ri2 is
the coefficient of determination resulting when the predictor variable Xi is regressed on
75
all the remaining predictors variables, the variance inflation factor for Xi (VIFi) is given
by
VIFi =)R–(1
12i
.
In the ordinary least squares (LS) the VIFs are the diagonal elements of the inverse of
the simple correlation matrix. The VIF indicate the inflation in the variance of each
regression coefficient over a situation of orthogonality. The magnitude of a VIF to be
considered high is essentially arbitrary. Usually, values larger than 10 suggest that
multicollinearity may be causing estimation problems (Chatterjee et al., 2000).
(2) Condition index
In the presence of multicollinearity, the determinant of the correlation matrix is very
low. Because the determinant is also equal to the product of eigenvalues �i, the presence
of one or more small eigenvalues results in a small determinant, thereby indicating
multicollinearity. A measure of multicollinearity called condition index (CI) is obtained
for each eigenvalue by
CIi = i
max�
�
where �max is the largest eigenvalue and �i is the ith eigenvalue of the correlation matrix.
Large CIi indicates dependencies among covariates, since �i will be close to zero. Belsley
(1991) suggests that a CI between 10 and 30 are of interest, indicating possible problems
of multicollinearity and CI larger than 30 provide reasonable evidence of considerable
multicollinearity.
76
(3) Variance-decomposition proportions associated with the eigenvalues
This statistic indicates which variables are involved in linear dependencies and how
much of the variance of the parameter estimate is associated with each eigenvalue.
Following Belsley (1991),
Var ( b ) = �2 (X�X)–1 = �2 V� –1V�,
where �2 is the residual variance estimate, V is a matrix containing the eigenvectors, and
� is a diagonal matrix of eigenvalues, diag (�1 �2 … �p). Writing V = vij, the variance of
the ith component of the regression coefficient vector b can be decomposed in to a sum of
components, each associated with one eigenvalue, as
Var ( kb ) = �2
j
2ij
p
1=j �
v, where p is the number of predictor variables.
Because the eigenvalues appear in the denominator, those components of the variance
associated with dependencies (small �j) will be relatively large compared to the other
components. Thus, a high proportion of two or more coefficients associated with the same
small eigenvalue provides evidence that corresponding dependencies are causing
problems.
Let tkj = j
2ij
�
v and ti =
p
1=jijt , with i = 1, … , p. The proportion of the variance of the ith
regression coefficient associated with the jth component of its decomposition is obtained
as
�ji = i
ij
t
t, with i, j = 1, … , p.
77
An approach recommended by Belsley et al. (1980) is to identify eigenvalues �j
having CI greater than 30. Variables with variance-decomposition proportions �ji larger
than 0.5 for each of these eigenvalues are candidates for linear dependencies. The
measures of multicollinearity were obtained using the regression procedure, option
COLLINOINT, of the SAS statistical software (SAS Institute Inc., 1990).
Genetic analysis
The genetic model for pre-weaning gain, defined in matrix notation, was:
y = Xb + Fv + Za + Wm + Sp + e, (1)
where
y = vector of observations;
b = vector of fixed genetic effects. This vector included direct and maternal breed
additive, dominance, and epistatic loss effects;
v = vector of fixed environmental effects. This vector included age of the calf as a
covariate (linear and quadratic effects), and age of the dam by sex of the calf and
contemporary group (herd-year-season-management group) as classification variables;
a = vector of random direct additive genetic effects;
m = vector of random maternal additive genetic effects;
p = vector of random maternal permanent environment, and
e = vector of random residual effects.
X, F, Z, W, and S are incidence matrices relating records to fixed genetic, fixed
environmental, direct genetic, maternal genetic, and permanent environment effects,
respectively.
78
Random effects a, m, p, and e were assumed to have variance matrices equal to A�a2,
A�m2, I�p
2, and I�e2, respectively, where A is the additive numerator relationship matrix
among animals and I is an identity matrix. Covariance between a and m was assumed
equal to A�am. The estimates of �a2, �m
2, �am, �p2, and �e
2 used in the analyses were those
reported in Chapter 3. Homogeneity of variances and the same dominance and epistatic
loss effects for crosses of different pairs of breeds, and no interactions between genetic
and environmental effects were assumed.
The solutions for the genetic model (1) were obtained through the following
procedure:
Step 1: Obtain solutions for v, a, m, and p, using the model
y1 = Fv + Za + Wm + Sp + e1,
where y1 = y – X b . In the first iteration b was set to values obtained by LS. The DMU
program (Madsen and Jensen, 2000) was used.
Step 2: Using the ridge regression technique, obtain solutions for b, using the model
y2 = Xb + e2,
where y2 = y – F v – Za – Wm – Sp , and v , a , m , and p are solutions obtained in
the first step. The programs to run step 2 were developed using the Fortran language and
the IML procedure of SAS statistical software (SAS Institute Inc., 1990).
Steps 1 and 2 were repeated until convergence. The convergence was attained
when the largest absolute difference between the solutions in b in the current and in the
previous iteration was smaller than 10–4. The final estimates for b obtained in the second
step are equal to estimates obtained in a model where all effects are solved
simultaneously. However, the standard errors of the estimates are smaller because the
79
number of parameters estimated in the sub-model is smaller than would be in the full
model.
Ridge regression
The usual model for a multiple linear regression is
y = Xb + �,
where y is a (n × 1) vector of observations, X is a (n × p) design matrix of rank p, and �
is a (n × 1) vector of random residuals with assumptions E(�) = 0 and E(���) = I�2. The
unknown parameter vector, b, using the least squares criterion, is estimated by solving
(X�X) b = X�y to give b = (X�X)–1 X�y. Estimates and corresponding variances could be
unreliable in the presence of multicollinearity. The ridge regression estimator consists of
adding a small positive amount on the diagonal of the X�X matrix, causing a reduction in
the variance of the estimates at the expense of introducing some bias. Thus, the ridge
regression estimator of b takes the general form
kb = (X�X + K)–1X�y,
where K = diag (k1 k2 … kp), ki � 0. When all ki elements are equal to zero, kb reduces
to the LS estimator.
The variance-covariance matrix of kb is
Var( kb ) = (X�X + K)–1X�X(X�X + K)–1�
2.
The mean square error (MSE), which is a measure of the expected squared distance
of kb to b, is
MSE = E[( kb – b)� ( kb – b)] = trace [Var( kb )] + b�(Z – I)�(Z – I)b.
80
MSE = Total variance + (Bias)2,
where Z = (X�X + K)–1X�X.
The variance inflation factors of the ridge regression coefficients are diagonal
elements of the matrix (X�X + K)–1X�X(X�X + K)–1.
In the present study, the ridge regression analyses were carried out in the standardized
form of the model using the correlation matrix. After estimation, the estimates were
transformed to the original scale and were presented in this way.
Objective methods for selecting the ridge parameter K
The optimal value of the ridge parameter K, which results in smaller MSE than that
obtained with LS, depends on the unknown parameter vector b and the unknown error
variance �2 (Hoerl and Kennard, 1970a). As a consequence, K must be determined
empirically or estimated from the data. In this study the ridge parameter K was estimated
through two objective methods.
(1) Generalized Ridge Estimator of Hoerl and Kennard (R1)
In the Generalized Ridge Regression Estimator of Hoerl and Kennard (Hoerl and
Kennard, 1970a), an orthogonal transformation V is applied to reduce X�X to a diagonal
matrix. We have that
V(X�X)V� = �,
where V is a (p × p) orthogonal matrix whose columns v1, v2, … ,vp are the eigenvectors
of X�X and � is a diagonal matrix of eigenvalues of X�X. Writing
X* = XV�
and
81
� = V�b,
then the model y = Xb + � can be written as
y = X*� + �,
where
(X*)�(X*) = �.
The generalized ridge regression procedure is then defined as
k� = [(X*)�(X*) + K]–1 (X*)�y,
where
K is a diagonal matrix with non-negative diagonal elements k1, k1, … , kp.
Hoerl and Kennard (1970a) showed that theoretical optimal values for ki are given by
ki = �2/ 2i� . The authors suggested an iterative procedure to estimate ki. This procedure
may be summarized as follows.
1. Reduce the system to canonical form.
2. Take the least squares as the starting point to compute )j(ik = �2/ 2)0(i� , i = 1, 2, … ,
p, where �2 is the LS estimator of �2 and j denotes the jth iteration.
3. Use the )j(ik values in the ridge regression equation to obtain )1+j(i� .
4. Compute a new estimate for ki using )1+j(ik = �2/ 2)1+j(i� .
Go to step 3 until convergence of ik . The convergence was achieved when the
difference between ik ’s of two successive iterations were smaller than 10–7. After
convergence the estimates k� were converted back to kb through the equation
kb = V k� .
82
(2) Bootstrap in combination with cross-validation (R2) The bootstrap and cross-validation for estimating the ridge parameter, originally
suggested by Delaney and Chatterjee (1986), was extended to consider the instability of each predictor variable. The elements ki of the ridge parameter K were estimated by
ik = �b )VIF(max
VIFi ,
where VIFi is the variance inflation factor of the ith predictor variable. A value �b has to
be chosen to generate a K matrix that minimizes the mean squared error of prediction
(MSEP). The magnitude of the elements ik of the ridge parameter matrix K will be
proportional to the variance inflation of each predictor variable. As a result, unnecessary
bias will not be imposed to those predictors not seriously involved in multicollinearity.
The MSEP was estimated combining bootstrap with cross validation (described later).
The bootstrap is a powerful resampling procedure originally proposed by Efron (1979). In
the bootstrap procedure, a random sample of n observations with replacement is taken for
a particular population. This sample will contain observations that were chosen more than
once and observations that were not chosen. The sample obtained in this manner is known
as bootstrap sample. If a large number of bootstrap samples is performed, the estimates of
the parameters of interest will approach the true parameter.
A strategy using bootstrap in combination with cross-validation to estimate the ridge
parameter matrix K can be summarized as follows.
1. Select a vector � containing values of � between 0 and 1.
2. Choose a bootstrap sample of n observations with replacement.
83
3. For each bootstrap sample and each value of � obtain K and the ridge estimator
vector kb , where K = diag( 1k 2k … pk ). Use the ridge estimator to predict
observations that were not chosen in the bootstrap sample. If the prediction vector for the
unchosen observations is ky ( K ), the MSEP of the jth bootstrap sample and K ridge
parameter, given �, is
MSEPj( �K ) = [ ky ( �K ) – y]� [ ky ( �K ) – y] / Nj,
where Nj is the number of unchosen observations (randomly determined) in the jth
bootstrap sample.
4. Repeat steps 2 and 3 for B bootstrap samples and obtain a final average of MSEP
for each � value as
MSEP( �K ) = B
1=jj
j�
B
1=jj
N
)N()]ˆ(MSEP[ K
.
A value of � that generates a matrix K of ridge parameters that minimizes the MSEP
is then chosen (�b). The MSEP were obtained for values of � ranging from zero to one,
with increments of 0.001, on the basis of one hundred bootstrap samples with
replacement.
Mean squared error of prediction and variance inflation factor
After selecting the ridge parameter matrix K for each ridge regression method and to
obtain the ridge regression estimates, one hundred bootstrap samples with replacement
were generated and used for computing average MSEP and VIF. Ridge regression
84
methods and LS were compared with respect to average MSEP and VIF. A model that
results in lower VIF and smaller MSEP is desirable because these statistics indicate
stability in the estimates and ability of the model to predict future observations.
Bias measurement
Given that E( b ) = b and E( kb ) = (X�X + K)–1 X�Xb = Hb, a measurement of the bias
of the ridge regression vector kb was obtained by 1 – ||||||||
IH
× 100, where || || denotes the
Euclidean norm. Thus, a bias measurement close to zero for a particular ridge regression
method will indicate little bias in the estimates.
Comparison of across-breed estimated breeding values
Across-breed estimated breeding values (ABC) models that used LS and ridge
regression methods for estimation of fixed genetic effects were compared through
correlations (Pearson and Spearman), and percentages of coincidence for different
proportions of selected (top 1%, 10%, 20%, and 40%) sires, dams, and calves. ABC were
calculated by adding EBV and estimates of direct breed additive effects, weighted by the
breed composition of the animal. The following models were considered in the
comparison.
(1) Additive-dominance models
AD-AH: The pre-weaning gain was pre-adjusted for expected heterosis based on
averages from literature. An ad hoc heterosis (direct and maternal) of 5% for an animal
85
with heterozygosity of 100% was assumed. Breed additive effects were estimated through
LS.
AD-LS: The pre-weaning gain was adjusted for dominance (heterosis) effects using
information from the dataset under investigation. Breed additive and dominance effects
were estimated through LS. Estimates of direct and maternal heterosis were equal to
1.31% and 1.84%, respectively.
AD-R2: This model differed from model AD-LS by the fact that breed additive and
dominance (heterosis) effects were estimated through R2 instead of LS. Estimates of
direct and maternal heterosis were equal to 1.22% and 1.23%, respectively.
(2) Additive-dominance-epistatic models
ADE-LS: Breed additive, dominance and epistatic loss effects were estimated using
LS. Estimates of direct and maternal dominance were equal to 1.31% and 2.28%,
respectively, whereas estimates of direct and maternal epistatic loss were equal to –2.19%
and –0.08%, respectively.
ADE-R1: Breed additive, dominance and epistatic loss effects were estimated using
R1. Estimates of direct and maternal dominance were equal to 1.31% and 1.72%,
respectively, whereas estimates of direct and maternal epistatic loss were equal to –1.04%
and –0.05%, respectively.
ADE-R2: Breed additive, dominance and epistatic loss effects were estimated using
R2. Estimates of direct and maternal dominance were equal to 1.23% and 1.55%,
respectively, whereas estimates of direct and maternal epistatic loss were equal to –0.66%
and –0.03%, respectively.
86
ABC from model ADE-R2 were assumed as the reference estimates for calculating
Pearson and Spearman correlations, and percentages of coincidence with all other models.
RESULTS
Multicollinearity diagnostics
The matrices X�X and X�y in the correlation form are presented in Table 4.1.
Coefficients of maternal dominance and direct epistatic loss effects were strongly
correlated (r = 0.95). Looking within a breed, coefficients of direct and maternal breed
additive effects were always equal to or higher than 0.80. The severity of
multicollinearity, however, should not be quantified by the magnitude of these pairwise
correlations because the interrelation among three or more variables might result in a high
degree of multicollinearity, even when pairwise correlations are low. Better measures of
the degree of multicollinearity are given by the eigenvalues of the correlation matrix and
corresponding condition indices (Table 4.2), variance inflation factors (Figure 4.1), and
variance-decomposition proportions associated with the eigenvalues (Table 4.3).
Eigenvalues and corresponding condition indices are presented in Table 4.2. The last
eigenvalue was very small (� = 0.00189). This eigenvalue was associated with condition
index 38.85, reflecting dependencies between predictor variables. The second smallest
eigenvalue was equal to 0.05078, with corresponding condition index equal to 7.50.
Variance inflation factors shown in Figure 4.1 indicate that the variance of the LS
estimates of 16 out of 24 regression coefficients would be inflated by more than 10 fold
(VIF larger than 10) compared to what would be expected in an orthogonal system.
Variance-decomposition proportions associated with the largest condition index (CI =
38.85) suggests that breed composition was the main candidate for the dependencies
87
(Table 4.3). For 9 direct and 5 maternal breed additive effects, a fraction of the variance
of the estimated regression coefficients larger than 50% was associated with
dependencies indicated by the largest condition index.
Combining information from Table 3.7 and Figure 3.3 (Chapter 3) with information
from Table 4.3 breeds with smaller number of records and higher standard errors for the
estimated regression coefficients, which included BD, GV, MA, SA, and SH, had lower
proportion of the variance of the estimates associated with linear dependences among
predictor variables. On the contrary, breeds with larger number of records and lower
standard errors for the estimated regression coefficients, including AN, CH, HE, LM, and
SM, had higher proportion of the variance of the estimates associated with linear
dependences among predictor variables.
The second largest condition index (CI = 7.50) points out possible dependencies
involving maternal dominance and direct epistatic loss effects (Table 4.3). 85% and 83%
of the variances of estimated regression coefficient of maternal dominance and direct
epistatic loss effects were associated with linear dependences between the corresponding
predictor variables.
Ridge parameter K
The ridge parameters obtained by the two objective methods are shown in Table 4.4.
The selected constant �b for calculating the ridge parameter K in R2, which minimizes the
MSEP, was equal to 0.04. The average and the standard deviation of the number of
unchosen observations over the bootstrap samples in the last iteration for solving the
genetic model (1) were equal to 176,002 and 257, respectively. The elements of the ridge
88
parameter K obtained on the basis of R1 were generally smaller than those on the basis of
R2.
Convergence of estimates of fixed genetic effects
Estimates of direct and maternal breed additive, dominance, and epistatic loss effects
after each iteration for solving the genetic model (1), under both ridge regression
methods, are depicted in Figures 4.2 and 4.3, respectively. To minimize the number of
iterations needed to achieve convergence, the least squares estimates were used as the
starting point for R1 and R2. Estimates of fixed genetic effects converged faster under R2
(60 iterations) than under R1 (135 iterations). The slower convergence of R1 was due to
direct and maternal breed additive effects of SA. In general, estimates of direct and
maternal effects moved in opposite directions in the first iterations before stabilizing.
Mean squared error of prediction and variance inflation factor
Table 4.5 shows the average MSEP and VIF obtained over one hundred bootstrap
samples under LS and ridge regression methods. Both ridge regression methods
outperformed the LS estimator with regard to MSEP and VIF. The MSEP of the two ridge
regression methods were similar, which were 3% lower than the MSEP of LS. Ridge
regression methods were also superior to LS when compared with respect to reduction in
VIF. The average VIF of LS estimates was equal to 26.81. This value was reduced to 6.10
and 4.18 by R1 and R2, respectively. In the R1, only two regression coefficient estimates
still had VIF larger than 10, whereas in R2 all VIF were lower than 10.
The last two rows in Table 4.5 are the total variance and the square of the estimates.
Methods used to deal with multicollinearity problems typically generate predictors with
89
smaller variance and smaller range of the predictor vector in comparison to the LS
estimator. These two statistics are influenced by the magnitude of the ridge parameter.
Small ridge parameters imply less restriction (shrinkage) on the size of regression
coefficients, while large values of ridge parameters imply an a priori belief that estimates
of regression coefficients should be smaller or restricted.
From Table 4.5 the two ridge regression methods provided a general improvement
over the LS, when evaluated by MSEP and average VIF obtained over a large number of
bootstrap samples. Additional information for comparing the ridge regression methods,
based on reduction of instability of each parameter estimate, is presented in Figure 4.4.
When multicollinearity was of concern both ridge regression methods caused substantial
reduction in the VIF, but VIF given by R2 were smaller than VIF given by R1 for most
predictor variables.
Bias measurement
A known relationship between the ridge parameter and both variance and bias of the
ridge regression estimates is that, as the ridge parameter increases, the variance decreases
and the bias increases. The ridge parameters obtained on the basis of R2 were generally
larger than the ridge parameters obtained on the basis of R1 (Table 4.4). As a
consequence, larger bias in the estimates of regression coefficients of R2, compared to
R1, can be expected. Bias measurements of R1 and R2 were equal to 1.49 and 5.61%,
respectively.
90
Dominance and epistatic loss effects
Estimates of dominance and epistatic loss effects and respective standard errors
obtained by LS and by ridge regression methods are presented in Table 4.6. For ease of
comparison, estimates and corresponding standard errors are also displayed in Figures 4.5
and 4.6, respectively. Estimates of direct and maternal dominance effects were obtained
as partial regressions on predictor variables HD and HM, while estimates of direct and
maternal epistatic loss effects were obtained as partial regressions on predictors ED and
EM, respectively. Estimates of dominance and epistatic loss effects were of opposite sign.
Both direct and maternal dominance effects resulted in a favourable effect on pre-
weaning gain. Direct and maternal epistatic loss effects decreased the pre-weaning gain.
The estimate of maternal epistatic loss, however, was statistically not different from zero
(P > 0.05).
Because predictor variables HM and ED were involved in multicollinearity, ridge
regression methods caused substantial changes in the estimates of maternal dominance
and direct epistatic loss effects. A small reduction in the standard errors of estimates of
maternal dominance and direct epistatic loss effects was obtained through ridge regression
methods. This reduction was slightly more pronounced under the R2 method.
Breed additive effects
Estimates and standard errors of direct and maternal breed additive effects, as
deviations from AN, are presented in Table 4.7. Estimates and standard errors of direct
and maternal breed additive effects are also depicted in Figures 4.5 and 4.6, respectively.
Estimates of direct breed additive effects showed large standard errors under LS
(Figure 4.5). Ridge regression methods substantially reduced the standard errors when
91
predictor variables were more associated with multicollinearity. In the extreme case of
multicollinearity pointed out by the largest VIF in Figure 4.1, which corresponds to the
HE breed, the standard error of the estimate of direct breed additive genetic effect was
reduced from 0.63 in the LS to 0.20 in the R1 and to 0.10 in the R2.
Estimates of maternal breed additive effects had a different pattern in comparison to
direct effects. The ridge regression estimates of maternal breed effects of GV, HE, MA,
and SM were of larger magnitude than ordinary least square estimates (Figure 4.6). The
standard errors given by ridge regression methods, however, were always smaller than
standard errors given by LS. Increasing the ridge parameter K indefinitely in the ridge
regression analysis will force all coefficients to zero, but for small values of ki it is not
uncommon to see a regression coefficient increase in absolute value as ki increases
(Marquardt and Snee, 1975).
Figures 4.5 and 4.6 show that estimates of direct and maternal breed additive effects
of BD, GV, MA, SA, and SH still had relatively large standard errors under ridge
regression methods in comparison to the remaining breeds. It was previously shown,
however, that variance-decomposition proportions of maternal breed additive effects for
BD, GV, MA, SA, and SH associated with the largest condition index were lower than
0.5 (Table 4.3). Thus, the large standard errors of the estimates of maternal breed effects
for BD, GV, MA, SA, and SH are more likely a consequence of the relatively small
number of observations in these breeds rather than due to multicollinearity involving the
corresponding predictor variables.
The use of ridge regression methods caused changes in the contrasts between
estimates of breed effects, which will ultimately be reflected in how breeds rank. Because
breed estimates are part of across-breed estimated breeding values used as a criterion of
92
selection in the beef industry, the option for a particular model may have practical
implications.
Sampling correlations
For obtaining information with regard to degree of confounding between estimates
given by LS, R1, and R2, sampling correlations among estimates of breed additive,
dominance, and epistatic loss effects were calculated. Overall averages of absolute values
of pairwise correlations between estimates under LS, R1, and R2 were equal to 0.49, 0.30,
and 0.18, respectively. These correlations indicated a substantial reduction in the degree
of overall association between estimates given by ridge regression methods, especially
with R2, comparatively to LS. The reduction in the degree of association between
estimates was more pronounced between estimates of direct and maternal breed effects
involving different breeds than between direct and maternal breed effects for the same
breed.
Figure 4.7 shows correlations between estimates of maternal dominance and direct
epistatic loss effects and between direct and maternal breed additive effects for the same
breed. Averages of these correlations were equal to –0.88, –0.79, and –0.74 under LS, R1,
and R2, respectively. Under ridge regression methods, breeds more involved in
multicollinearity showed a substantial reduction in the degree of confounding between
estimates of direct and maternal breed additive effects, noticeably under R2. On the
contrary, estimates of maternal dominance and direct epistatic loss effects were still
highly correlated under both ridge regression methods. The correlation between estimates
of maternal dominance and direct epistatic loss effect was 0.94 in the LS and 0.93 in both
ridge regression methods.
93
Comparison of across-breed estimated breeding values
Comparisons of ABC from additive-dominance models AD-AH, AD-LS, AD-R2,
and the additive-dominance-epistatic models ADE-LS and ADE-R1 with ABC from
additive-dominance-epistatic model ADE-R2 with respect to Pearson and Spearman
correlations, and percentages of coincidence for different proportion of selected sires,
dams, and calves are depicted in Figure 4.8. Overall Pearson and Spearman correlations
between ABC were high, ranging from 0.85 to 1.0. ABC from model AD-R2 were
perfectly correlated with ABC from model ADE-R2. Even when only 1% of top animals
were compared on the basis of ABC, percentages of coincidence between AD-R2 and
ADE-R2 were equal to or higher than 0.99. Both models AD-R2 and ADE-R2 used ridge
regression method R2 for estimating the fixed genetic effects, but model AD-R2 did not
include epistatic loss effects. Thus, the inclusion of epistatic loss in the genetic model did
not cause re-ranking of sires, dams, and calves. This observation is also corroborated by
the fact that Pearson and Spearman correlations and percentages of coincidences of
models AD-LS and ADE-LS with model ADE-R2 were very similar. Fixed genetic
effects of both models AD-LS and ADE-LS were estimated using LS, but they differed
by the fact that model AD-LS did not include epistatic loss effects.
When the highest 1% ABC of sires, dams, and calves under model ADE-R2 and
under models AD-AH, AD-LS, ADE-LS, and ADE-R1 were compared, percentages of
coincidence were much lower than the overall Pearson and Spearman correlations,
especially in the model AD-AH (0.66, 0.65, and 0.61, for sires, dams, and calves,
respectively). These results point out important re-ranking of top animals. Among the 1%
best sires, dams, and calves, 34, 35, and 39% selected based on ADE-R2 would not be
selected based on model AD-AH, which assumed an ad hoc heterosis of 5% for both
94
direct and maternal effects and did not account for multicollinearity among predictor
variables. As the percentage of selected animals under model AD-AH increased to 0.40,
the percentages of coincidence with model ADE-R2 increased to 0.78 for sires, 0.81 for
dams, and 0.78 for calves. Higher percentages of coincidence between models AD-AH
and AD-LS (not shown) than between models AD-AH and ADE-R2 suggested that
practical differences between models AD-AH and ADE-R2 were predominantly from
differences in breed additive effects rather than different non-additive effects. When 1%,
10%, 20%, and 40% highest calves’ ABC under models AD-AH and AD-LS were
compared, percentages of coincidence were 0.81, 0.87, 0.92, and 0.93, respectively.
Models AD-LS and ADE-LS were similarly correlated with model ADE-R2.
Compared to model AD-AH, models AD-LS and ADE-LS had larger percentages of
coincidence with model ADE-R2. However, the difference with model ADE-R2 was still
substantial. Among the 1% highest ABC, approximately 30% of selected animals, based
on ADE-R2, would not be selected based on models AD-LS and ADE-LS. Among the
40% highest ABC under model ADE-R2, approximately 20% of selected animals would
not be selected based on models AD-LS and ADE-LS. Model ADE-R1 showed a larger
percentage of coincidence with model ADE-R2 than models AD-AH, AD-LS, and ADE-
LS, but differences with ADE-R2 were still considerable. These results confirm that the
choice of the method to estimate the ridge parameter has consequences to genetic
selection, resulting in different ranking of animals on the basis of across-breed estimated
breeding values.
95
Use of the same ridge parameter in subsequent genetic evaluations
The inclusion of a small bias in the estimates of fixed genetic effects through the
ridge parameter K of both ridge regression methods resulted in smaller average MSEP
and VIF comparatively to LS. Because the optimal value of the ridge parameter K
depends on the unknown parameter vector b and unknown error variance �2, in practice K
must be determined empirically or estimated from the data. In this study, K was
determined from the data using two different objective methods. Estimation of K in the
same frequency that genetic evaluations are commonly run might increase computational
demand when very large datasets are used. Thus it would be worth investigating whether
the same ridge parameter could produce stable estimates of fixed genetic effects in
subsequent genetic evaluations, when more records are added to the data file. This is
equivalent to investigating how much K changes when more records are added to the data
file.
A simulation of data accumulation was performed. The ridge parameter K was
determined using records from 1986 (first year with available records) to 1996 and was
used in the estimation of fixed genetic effects when records of subsequent years (1997,
1998, and 1999) were added to the data file. Ridge regression methods and LS were then
compared with respect to stability of estimates over years. Table 4.8 presents the number
of calves from 1986 to the mentioned year, expressed as equivalent purebred calves. The
percentages of the total number of calves per breed in 1996, which were used to
determine the ridge parameter, varied from 58.87% (GV) to 95.74% (HE). The number of
calves showed large variation among breeds. GV and HE represented the smallest and the
largest number of calves, respectively. The ridge parameter K obtained by the two
objective methods, using records from 1986 to 1996, is shown in Table 4.9. The ridge
96
parameter determined by R2 did not greatly differ from the ridge parameter determined
using the entire dataset (Table 4.4). However, noticeable changes in the ridge parameter
determined by R1 were observed, particularly for predictor variables of maternal breed
additive effects of SA, GV, MA, and SM. An attempt was made to estimate the ridge
parameter using records from 1986 to 1995, but maternal breed effect of GV did not
converge with R1 after 200 iterations, likely due to the small number of records for this
breed.
Estimates of breed additive, dominance, and epistatic loss genetic effects under LS,
R1, and R2, from 1996 to 1999, were used to construct Figures 4.9, 4.10, and 4.11,
respectively. Estimates given by both ridge regression methods were less sensitive to
inclusion of new records than estimates given by LS, regardless of the fact that the ridge
parameter was determined using records from 1986 to 1996. Estimates under R2 were
more stable across years than estimates under R1.
DISCUSSION
Breed additive, dominance, and epistatic loss effects are generally estimated using a
multiple regression equation, where breed composition, breed heterozygosities and
functions of heterozygosities can be used as predictor variables. Linear relationships
among predictor variables used to estimate breed additive, dominance, and epistatic loss
effects result in multicollinearity. As a consequence, it is difficult to estimate the unique
effect of an individual variable in the regression equation and regression coefficients are
highly unstable. A potential approach to deal with multicollinearity problems is the ridge
regression method.
97
In the current investigation, multicollinearity diagnostics were performed using
different measures: variance inflation factor, condition indices, and variance-
decomposition proportions associated with eigenvalues. These measures of
multicollinearity were obtained after standardization (centering and scaling) of predictor
variables, as recommended by Marquardt and Snee (1975) and Freund and Littell (2000).
Scaling the predictor variables removes the near dependencies that are due to the scales
on which predictor variables were expressed rather than a real defect of the data, while
centering the predictor variables removes the correlation between the constant term and
all linear terms in a linear model.
Multicollinearity diagnostics suggested that direct and maternal breed effects were
the main candidates for linear dependencies, followed by maternal dominance and direct
epistatic loss effects. The multicollinearity involving breed composition can be partially
explained by the mathematical constraint among breeds, because breed portions in the
breed composition of an animal add to one and the breed composition of a calf is equal to
the average breed composition of the sire and of the dam. In practice, after fitting breeds
that are more representative in the data, less new information is added by fitting the
remaining breeds. Similarly, after fitting the breed of the dam, less new information is
added fitting the breed of the calf, and vice-versa. The other possible multicollinearity
problem, which involved coefficients for maternal dominance effects and direct epistatic
loss effects, can be a consequence of the small proportion of crossbred sires, as shown in
Chapter 3.
Ridge regression models that add the same amount to the diagonal of the matrix X�X
are known in the literature as ordinary ridge regression. Previous analysis using different
ordinary ridge regression methods described by Gruber (1998) resulted in small reduction
98
in the variance inflation factors and similar MSEP to LS (not reported), in line with
Delaney and Chatterjee (1986). These authors stated that the ordinary ridge regression
model is not appropriate for multicollinearity caused by physical or mathematical
constraints in the data. Because breed composition sums to one for each observation, a
mathematical constraint was present in the data. Generalized ridge regression methods are
advised to deal with this source of multicollinearity.
With an optimal choice of the ridge parameter matrix K, the ridge estimators have
smaller individual and total MSE than the LS estimators (Hoerl and Kennard, 1970a;
Lowerre, 1974; Gruber, 1998). The optimal K, however, cannot be determined with
certainty because it depends on the unknown parameter vector b and the unknown error
variance �2. As a consequence, K must be determined empirically or estimated from the
data. Thus, K could change as data are accumulated over time.
The performance of ridge regression methods has been generally evaluated in terms
of reduction in MSE compared to LS using computer simulation (Gruber, 1998). A given
simulation, of course, cannot hope to cover a large range of practical situations,
particularly when a large number of factors are involved. In this study, the performance
of ridge regression methods was evaluated in terms of MSEP, as in Delaney and
Chatterjee (1986) and Hébel et al. (1993), under the assumption that smaller MSE will
result in smaller MSEP. A procedure combining bootstrap resampling and cross-
validation was used to obtain the average MSEP over a large number of samples. This
procedure is supported by maximum likelihood principles because sample statistics based
on a large number of bootstrap samples tends to approach the true parameter (Delaney
and Chatterjee, 1986). VIF of the estimates under ridge regression methods and LS were
also obtained and used to evaluate the performance of ridge regression methods.
99
Results obtained by LS and by ridge regression Methods 1 and 2 were compared
through average MSEP and VIF obtained over one hundred bootstrap samples. The use of
ridge regression resulted in smaller MSEP and VIF than LS. Both ridge regression
methods had similar MSEP (3% lower than under LS), suggesting that specific linear
combinations of estimated regression coefficients were equally determined, even though
individual coefficients differed between methods (Belsley, 1991).
Average VIF given by ridge regression methods 1 and 2 were 77% and 84%
respectively lower than average VIF given by LS. Therefore, larger bias in the estimates
of R2 was compensated for by a substantial reduction in the variance of the estimates.
Consequently, estimates obtained by ridge regression methods, notably by R2, will be
less sensitive to small changes in the dataset, such as inclusion of new observations. This
expectation was confirmed when ridge parameters determined using records from 1986 to
1996 were used in the estimation of fixed genetic effects of subsequent years. The
estimates of breed additive, dominance, and epistatic loss effects under ridge regression
methods were more stable over the years than estimates under LS. This observation has
practical implications in routine genetic evaluations: First, more consistency in the across-
breed estimated breeding values can be expected in successive genetic evaluations, which
can foster more confidence among producers in the genetic improvement program, and,
second, the ridge parameter can be estimated less often than genetic evaluations are run,
decreasing the computational demand.
Estimated direct and maternal dominance effects were favourable, whereas direct and
maternal epistatic loss decreased the pre-weaning gain. Maternal epistatic loss effects,
however, were statistically not different from zero. Dominance effects, represented by
coefficient HD and HM, indicate deviation from average dominance within breed due to
100
differences in gene frequencies between breeds (breed heterozygosity). Coefficients of
epistatic loss effects, ED and EM, express the recombination loss due to breed
heterozygosity in relation to F2 calves and F2 dams, respectively. According to Koch et
al. (1985), long-term selection within a breed can increase frequencies of favourable non-
allelic combinations, which result in favourable effects on phenotype. When breeds are
crossed, random recombination of loci in the progeny tends to reduce the frequencies of
these parental breed combinations towards Hardy-Weinberg equilibrium, resulting in
recombination loss.
Estimated maternal dominance and direct epistatic loss effects were of opposite sign
and comparable magnitude, and had large standard errors under LS. Both ridge regression
methods R1 and R2 seemed to slightly alleviate the multicollinearity involving maternal
dominance and direct epistatic loss effects. The estimates of maternal dominance and
direct epistatic loss effects were reduced from 2.28% and –2.19% in LS to 1.72% and
–1.04% in the R1, and to 1.55% and –0.66% in the R2, respectively. Sampling
correlations between estimates, however, showed that maternal dominance and direct
epistatic loss effects were still highly confounded under ridge regression methods. These
results suggest that the variety of crosses available in the dataset, aggravated by linear
dependences between HM and ED, did not comprise enough information to effectively
separate maternal dominance and direct epistatic loss effects, regardless the fact that both
effects were statistically significant.
Estimates of direct and maternal dominance obtained in this study were lower than
the range of heterosis from 3 to 8% (mean = 4%) reported by Long (1980) on pre-
weaning gain. The causes for these low estimates of dominance effects are not clear.
Partial confounding of contemporary group effects with breed composition and breed
101
heterozygosity effects could be a reasonable explanation of the low estimates of
dominance effects. However, a preliminary analysis to check for connectedness among
contemporary groups across breeds was performed and only connected contemporary
groups with at least two classes of direct or maternal heterozygosities were retained for
the analysis. Additional analyses where dominance effects were estimated for pair of
breeds with a large number of records to accommodate possible specific combining
ability between breed pairs, excluding epistatic loss effects in the model, likewise resulted
in low estimates (not reported), in agreement with results obtained by Miller (1996).
Additive-dominance models, which simultaneously estimate additive and heterosis
effects or estimate additive effects after adjusting for heterosis on the basis of expected
breed heterozygosity, are standard models for genetic evaluation in the beef industry.
Comparisons of ABC from additive-dominance models with ABC from additive-
dominance-epistatic models, using either LS or ridge regression methods for estimating
fixed genetic effects, allowed two important practical observations. The first observation
was that the re-ranking of sires, dams, and calves was essentially due to differences in
estimates of breed additive effects rather than differences in non-additive effects. The
second observation was that the inclusion of epistatic loss effects in the additive-
dominance model did not alter the rank of the animals on the basis of ABC. Estimates of
dominance and epistatic loss effects were of low magnitude and showed high degree of
confounding even when ridge regression methods were used.
102
CONCLUSIONS
Linear dependencies between predictor variables of direct and maternal breed
additive effects and between predictor variables of maternal dominance and direct
epistatic loss effects were the main causes of multicollinearity. The use of ridge
regression methods outperformed LS with regard to average VIF and MSEP. Estimates
obtained by ridge regression were more stable and could be used with advantage over LS
for prediction purposes. The ridge regression methods were particularly effective in
reducing the degree of multicollinearity involving predictor variables of breed additive
effects. The use of estimates obtained by ridge regression methods instead of estimates
obtained by LS for calculating ABC could increase the probability of properly ranking
animals for across breed comparisons.
The variety of crosses in the dataset provided little opportunity to separate
dominance and epistatic loss effects and provide accurate estimates. Due to high degree
of confounding between estimates of maternal dominance and direct epistatic loss effects,
it was not possible to compare the relative importance of these effects with a high level of
confidence. The inclusion of epistatic loss effects in the standard additive-dominance
model used in genetic evaluation did not cause appreciable re-ranking of animals on the
basis of ABC.
103
Table 4.1. Correlation coefficients among predictor variables of direct (D) and maternal
(M) fixed genetic effects (n = 478,466)
HD a ED AND BDD CHD GVD HED LMD MAD SAD SHD SMD UND
HD 1.00
ED 0.45 1.00
AND –0.09 –0.08 1.00
BDD 0.00 0.00 –0.05 1.00
CHD 0.06 0.05 –0.18 –0.08 1.00
GVD 0.01 0.02 –0.01 0.00 –0.04 1.00
HED –0.14 –0.20 –0.23 –0.09 –0.29 –0.05 1.00
LMD 0.10 0.10 –0.13 –0.07 –0.22 –0.03 –0.24 1.00
MAD 0.02 0.00 –0.03 –0.02 –0.05 –0.01 –0.07 –0.05 1.00
SAD –0.05 –0.02 –0.04 –0.02 –0.08 –0.01 –0.10 –0.07 –0.01 1.00
SHD 0.03 0.00 –0.06 –0.03 –0.08 –0.01 –0.11 –0.05 0.01 –0.01 1.00
SMD 0.00 0.05 –0.16 –0.07 –0.22 –0.04 –0.26 –0.23 –0.05 –0.07 –0.08 1.00
UND 0.26 0.32 –0.09 0.01 –0.02 0.00 –0.19 0.05 0.01 –0.01 –0.04 –0.03 1.00
HM 0.43 0.95 –0.07 0.00 0.07 0.02 –0.20 0.09 0.00 –0.02 –0.01 0.06 0.29
EM 0.09 0.17 –0.02 0.01 0.01 0.04 –0.15 0.06 0.03 0.02 –0.03 0.10 0.06
ANM –0.08 –0.06 0.90 –0.04 –0.16 0.00 –0.22 –0.08 –0.02 –0.03 –0.07 –0.15 –0.10
BDM –0.06 –0.03 –0.04 0.85 –0.06 0.01 –0.09 –0.05 –0.01 –0.02 –0.02 –0.06 –0.01
CHM –0.05 0.02 –0.14 –0.04 0.81 –0.02 –0.30 –0.12 –0.04 –0.06 –0.09 –0.17 –0.04
GVM –0.02 0.00 –0.01 0.00 –0.03 0.82 –0.04 –0.02 –0.01 –0.01 –0.01 –0.03 –0.01
HEM 0.07 –0.15 –0.24 –0.09 –0.24 –0.05 0.89 –0.18 –0.07 –0.10 –0.13 –0.22 –0.21
LMM –0.11 0.03 –0.09 –0.03 –0.16 –0.01 –0.23 0.80 –0.03 –0.05 –0.05 –0.17 0.00
MAM 0.01 0.00 –0.02 –0.01 –0.04 0.00 –0.07 –0.03 0.83 –0.01 0.00 –0.04 –0.02
SAM –0.08 –0.04 –0.04 –0.02 –0.07 0.00 –0.09 –0.06 –0.01 0.92 –0.02 –0.07 –0.02
SHM 0.09 0.03 –0.06 –0.03 –0.07 –0.01 –0.11 –0.03 0.01 0.00 0.92 –0.07 –0.05
SMM –0.02 0.05 –0.13 –0.05 –0.15 –0.02 –0.27 –0.16 –0.05 –0.06 –0.08 0.85 –0.06
UNM 0.24 0.29 –0.08 0.01 0.00 0.01 –0.19 0.04 0.01 0.00 –0.05 –0.03 0.92
PWG –0.05 –0.01 0.02 –0.12 0.03 0.06 –0.19 0.07 0.02 0.21 –0.01 0.05 –0.02
104
Table 4.1. Continuation …
HM EM ANM BDM CHM GVM HEM LMM MAM SAM SHM SMM UNM
HM 1.00
EM 0.19 1.00
ANM –0.06 –0.05 1.00
BDM –0.03 0.01 –0.04 1.00
CHM 0.02 0.03 –0.16 –0.05 1.00
GVM 0.00 0.03 –0.01 0.00 –0.03 1.00
HEM –0.16 –0.16 –0.25 –0.10 –0.33 –0.04 1.00
LMM 0.03 0.08 –0.10 –0.04 –0.15 –0.02 –0.25 1.00
MAM 0.00 0.03 –0.03 –0.01 –0.04 –0.01 –0.07 –0.03 1.00
SAM –0.03 0.01 –0.04 –0.02 –0.06 –0.01 –0.10 –0.04 –0.01 1.00
SHM 0.02 –0.02 –0.07 –0.03 –0.09 –0.01 –0.14 –0.06 –0.01 –0.02 1.00
SMM 0.06 0.11 –0.16 –0.06 –0.20 –0.03 –0.29 –0.16 –0.05 –0.06 –0.09 1.00
UNM 0.31 0.08 –0.10 –0.01 –0.04 –0.01 –0.23 –0.01 –0.02 –0.02 –0.06 –0.05 1.00
PWG 0.01 0.06 0.01 0.02 0.05 0.04 –0.19 –0.07 0.03 0.07 –0.01 0.22 –0.01
a H = dominance, E = epistatic loss, AN = Angus, BD = Blond D’Aquitane,
CH = Charolais, GV = Gelbvieh, HE = Hereford, LM = Limousin, MA = Maine-Anjou,
SA = Salers, SH = Shorthorn, SM = Simmental, UN = Unknown, and
PWG = pre-weaning gain.
105
Table 4.2. Eigenvalues of the correlation matrix among predictor variables of fixed
genetic effects and corresponding condition indices
Eigenvalue Condition index Eigenvalue Condition index
2.85885 1.00000 0.24485 3.41702
2.32962 1.10778 0.19129 3.86586
2.22183 1.13433 0.18175 3.96606
2.16618 1.14881 0.16806 4.12443
2.01692 1.19056 0.15531 4.29032
1.95458 1.20940 0.11851 4.91155
1.90077 1.22640 0.10036 5.33732
1.85255 1.24226 0.08539 5.78605
1.83773 1.24725 0.07706 6.09099
1.80300 1.25921 0.05844 6.99450
0.91022 1.77224 0.05078 7.50351
0.71406 2.00092 0.00189 38.85385
106
Table 4.3. Decomposition of the variance structure of the parameter estimates associated
with the two largest condition indices
Condition index = 38.85 Condition index = 7.50 Predictor variable a
Direct Maternal Direct Maternal
H 0.00 0.12 0.01 0.85
E 0.12 0.00 0.83 0.0
AN 0.91 0.74 0.00 0.00
BD 0.74 0.40 0.00 0.00
CH 0.96 0.83 0.00 0.00
GV 0.43 0.13 0.00 0.00
HE 0.95 0.85 0.00 0.00
LM 0.95 0.76 0.00 0.00
MA 0.60 0.30 0.00 0.00
SA 0.66 0.31 0.00 0.00
SH 0.66 0.48 0.00 0.00
SM 0.96 0.82 0.00 0.00 a H = dominance, E = epistatic loss, AN = Angus, BD = Blond D’Aquitane,
CH = Charolais, GV = Gelbvieh, HE = Hereford, LM = Limousin, MA = Maine-Anjou,
SA = Salers, SH = Shorthorn, and SM = Simmental.
107
Table 4.4. Values of the ridge parameter (K) obtained by ridge regression methods R1
and R2, for direct and maternal genetic effects
Direct Maternal Predictor
variable a R1 R2 R1 R2
H 0.000036 0.000754 0.004070 0.004438
E 0.000078 0.004377 0.000092 0.000413
AN 0.003156 0.020457 0.000515 0.008442
BD 0.010801 0.005596 0.027317 0.002506
CH 0.010583 0.030383 0.001836 0.010470
GV 0.005848 0.002052 0.000789 0.001359
HE 0.003317 0.040000 0.034061 0.017590
LM 0.000861 0.024490 0.000328 0.006996
MA 0.000522 0.003141 0.000308 0.001843
SA 0.000442 0.006771 0.011455 0.003565
SH 0.000094 0.006984 0.000413 0.004834
SM 0.000698 0.028564 0.000049 0.010279 a H = dominance, E = epistatic loss, AN = Angus, BD = Blond D’Aquitane,
CH = Charolais, GV = Gelbvieh, HE = Hereford, LM = Limousin, MA = Maine-Anjou,
SA = Salers, SH = Shorthorn, and SM = Simmental.
108
Table 4.5. Summary of results obtained over one hundred bootstrap samples for ordinary
least squares (LS) and ridge regression methods R1 and R2
Ridge regression Statistic LS
R1 R2
MSEP a ± SD 245.67 ± 2.13 238.40 ± 0.87 238.44 ± 0.87
Average VIF b 26.81 6.10 4.18
Maximum VIF 104.50 10.92 8.70
VIF > 10 c 16 2 0
Total variance d 153,371.55 34,918.49 23,909.80
bb'ˆ 1,324.27 933.18 466.91
a Mean squared error of prediction.
b Average variance inflation factor of 24 predictor variables.
c Number of predictor variables out of 24 with VIF higher than 10.
d Total variances of LS and ridge regression methods were obtained by trace[(X�X)–1]�2
and trace[(X�X + K)–1 X�X(X�X + K)–1]�2, respectively.
109
Table 4.6. Estimates of direct and maternal dominance (H) and epistatic loss (E) effects
on pre-weaning gain (kg), obtained by ordinary least squares (LS) and ridge regression
methods R1 and R2
Ridge regression LS
R1 R2
Direct Maternal Direct Maternal Direct Maternal
H
2.67 ± 0.07
(1.31%) a
4.64 ± 0.17
(2.28%)
2.67 ± 0.07
(1.31%)
3.50 ± 0.15
(1.72%)
2.51 ± 0.07
(1.23%)
3.16 ± 0.15
(1.55%)
E
–4.45 ± 0.31
(–2.19%)
–0.16 ± 0.15
(–0.08%)
–2.11 ± 0.29
(–1.04%)
–0.11 ± 0.15
(–0.05%)
–1.34 ± 0.27
(–0.66%)
–0.07 ± 0.15
(–0.03%)
a Values between parentheses were expressed relative to the overall phenotypic average.
110
Table 4.7. Estimates of direct and maternal breed additive effects on pre-weaning gain
(kg), as deviations from Angus, obtained by ordinary least squares (LS) and ridge
regression methods R1 and R2
Ridge regression LS
R1 R2
Breed a
Direct Maternal Direct Maternal Direct Maternal
BD 5.34 ± 0.72 –5.68 ± 0.53 3.50 ± 0.36 –4.00 ± 0.36 2.07 ± 0.36 –3.92 ± 0.41
CH 13.19 ± 0.62 –2.88 ± 0.36 9.57 ± 0.21 –0.96 ± 0.16 5.28 ± 0.11 1.09 ± 0.14
GV 10.41 ± 0.92 7.91 ± 0.89 8.62 ± 0.71 8.73 ± 0.82 7.65 ± 0.69 8.92 ± 0.82
HE –6.26 ± 0.63 –3.21 ± 0.36 –6.04 ± 0.20 –3.07 ± 0.11 –1.36 ± 0.10 –5.15 ± 0.12
LM –3.07 ± 0.63 0.61 ± 0.38 –4.24 ± 0.24 1.27 ± 0.20 –1.48 ± 0.13 –0.29 ± 0.18
MA 12.29 ± 0.79 0.25 ± 0.59 10.88 ± 0.54 1.02 ± 0.50 9.17 ± 0.50 1.84 ± 0.49
SA 0.60 ± 0.76 7.55 ± 0.60 3.05 ± 0.47 4.63 ± 0.47 0.12 ± 0.43 7.06 ± 0.47
SH –9.30 ± 0.75 4.57 ± 0.47 –9.83 ± 0.49 4.86 ± 0.35 –4.47 ± 0.42 2.13 ± 0.32
SM 14.19 ± 0.63 5.33 ± 0.37 12.33 ± 0.24 6.34 ± 0.18 6.19 ± 0.12 8.76 ± 0.15
a BD = Blond D’Aquitane, CH = Charolais, GV = Gelbvieh, HE = Hereford,
LM = Limousin, MA = Maine-Anjou, SA = Salers, SH = Shorthorn, and
SM = Simmental.
111
Table 4.8. Number of calves including records from 1986 to the indicated year, expressed
as equivalent purebred calves
1996 1997 1998 1999 Breed a
Number b % c Number % Number % Number
AN 40,203 88.65 42,192 93.03 43,864 96.72 45,352
BD 7,119 75.15 8,075 85.24 8,842 93.34 9,473
CH 84,505 88.78 88,409 92.88 92,022 96.67 95,190
GV 1,278 58.87 1,598 73.61 1,865 85.91 2,171
HE 125,674 95.74 127,780 97.35 129,659 98.78 131,265
LM 64,290 93.11 66,278 95.99 67,841 98.25 69,047
MA 4,436 94.06 4,588 97.29 4,700 99.66 4,716
SA 7,944 86.98 8,431 92.31 8,897 97.42 9,133
SH 11,913 93.14 12,312 96.26 12,560 98.19 12,791
SM 72,647 89.81 76,022 93.98 78,575 97.14 80,890 a AN = Angus, BD = Blond D’Aquitane, CH = Charolais, GV = Gelbvieh,
HE = Hereford, LM = Limousin, MA = Maine-Anjou, SA = Salers, SH = Shorthorn,
and SM = Simmental.
b To obtain the number of equivalent purebred calves, breed portions in the breed
compositions were added over all calves in the dataset.
c Number of calves in each year expressed as percentages of the total number of calves.
112
Table 4.9. Values of the ridge parameter (K), obtained by ridge regression methods R1
and R2, using records from 1986 to 1996
Direct Maternal Predictor
variable a R1 R2 R1 R2
H 0.000044 0.000877 0.002221 0.005142
E 0.000072 0.005068 0.000110 0.000479
AN 0.001998 0.023342 0.000983 0.009726
BD 0.008346 0.006061 0.008435 0.002711
CH 0.015607 0.034452 0.003557 0.011923
GV 0.002693 0.002255 0.004304 0.001556
HE 0.007567 0.046000 0.000489 0.020278
LM 0.000864 0.028105 0.000220 0.008025
MA 0.000656 0.003669 0.008829 0.002151
SA 0.001061 0.007809 0.008047 0.004145
SH 0.000168 0.008066 0.000179 0.005608
SM 0.000399 0.032597 0.001565 0.011728 a H = dominance, E = epistatic loss, AN = Angus, BD = Blond D’Aquitane,
CH = Charolais, GV = Gelbvieh, HE = Hereford, LM = Limousin, MA = Maine-Anjou,
SA = Salers, SH = Shorthorn, and SM = Simmental.
113
0
20
40
60
80
100
120
H E AN BD CH GV HE LM MA SA SH SM
Predictor variable
VIF
DirectMaternal
Figure 4.1. Variance inflation factor (VIF) associated with predictor variables of
direct and maternal dominance (H), epistatic loss (E), and breed additive effects
114
Direct Effects
-12
-9
-6
-3
0
3
6
9
12
15
0 15 30 45 60 75 90 105 120 135
Iteration
Est
imat
e (k
g)
HEBDCHGVHELMMASASHSM
Maternal Effects
-6
-3
0
3
6
9
12
0 15 30 45 60 75 90 105 120 135
Iteration
Est
imat
e (k
g)
HEBDCHGVHELMMASASHSM
Figure 4.2. Convergence of the estimates of direct and maternal dominance (H), epistatic
loss (E), and breed additive effects under ridge regression method R1
115
Direct Effects
-9
-6
-3
0
3
6
9
12
0 10 20 30 40 50 60
Iteration
Est
imat
e (k
g)
HEBDCHGVHELMMASASHSM
Maternal Effects
-6
-3
0
3
6
9
12
0 10 20 30 40 50 60
Iteration
Est
imat
e (k
g)
H
E
BD
CH
GV
HELM
MA
SA
SH
SM
Figure 4.3. Convergence of the estimates of direct and maternal dominance (D), epistatic
loss (E), and breed additive effects under ridge regression method R2
116
Direct Effects
0
20
40
60
80
100
120
H E AN BD CH GV HE LM MA SA SH SM
Predictor variable
VIF
LSR1R2
Maternal Effects
0
10
20
30
40
50
H E AN BD CH GV HE LM MA SA SH SM
Predictor variable
VIF LS
R1R2
Figure 4.4. Variance inflation factor (VIF) associated with predictor variables of direct
and maternal dominance (H), epistatic loss (E), and breed additive effects under ordinary
least squares (LS) and ridge regressions methods R1 and R2
117
Direct Effects
-15
-10
-5
0
5
10
15
Predictor variable
Est
imat
e (k
g)
LS
R1
R2
H E BD CH GV HE LM MA SA SH SM
Direct Effects
0
0.2
0.4
0.6
0.8
1
H E BD CH GV HE LM MA SA SH SM
Predictor variable
Stan
dard
err
or (k
g)
LSR1R2
Figure 4.5. Estimates (as deviations from AN) and standard errors of direct dominance
(H), epistatic loss (E), and breed additive effects under ordinary least squares (LS) and
ridge regression methods R1 and R2
118
Maternal Effects
-9
-6
-3
0
3
6
9
Predictor variable
Est
imat
e (k
g)LS
R1
R2
H E BD CH GV HE LM MA SA SH SM
Maternal Effects
0
0.2
0.4
0.6
0.8
1
H E BD CH GV HE LM MA SA SH SM
Predictor variable
Stan
dard
err
or (k
g)
LSR1R2
Figure 4.6. Estimates (as deviations from AN) and standard errors of maternal dominance
(H), epistatic loss (E), and breed additive effects under ordinary least squares (LS) and
ridge regression methods R1 and R2
119
0.5
0.6
0.7
0.8
0.9
1
AN BD CH GV HE LM MA SA SH SM
Predictor variable
Sam
plin
g co
rrel
atio
n x
-1.0
LS
R1
R2
HM x ED
Figure 4.7. Sampling correlations (multiplied by –1.0) between estimates of maternal
dominance (HM) and direct epistatic loss (ED) effects and between estimates of direct and
maternal breed additive effects given by ordinary least squares (LS) and ridge regression
methods R1 and R2
120
Sires
0.6
0.7
0.8
0.9
1
AD-AH AD-LS AD-R2 ADE-LS ADE-R1
Model
Cor
rela
tion
Pearson
Spearman
40%
20%
10%
1%
Dams
0.6
0.7
0.8
0.9
1
AD-AH AD-LS AD-R2 ADE-LS ADE-R1
Model
Cor
rela
tion
Pearson
Spearman
40%
20%
10%
1%
Calves
0.6
0.7
0.8
0.9
1
AD-AH AD-LS AD-R2 ADE-LS ADE-R1
Model
Cor
rela
tion
Pearson
Spearman
40%
20%
10%
1%
Figure 4.8. Pearson and Spearman correlations, and percentages of coincidence for
different proportions of selected (top 1%, 10%, 20%, and 40%) sires, dams, and calves on
the basis of ABC yielded by different models compared to model ADE-R2
121
Direct Effects
-15
-10
-5
0
5
10
15
20
Predictor variable
Est
imat
e (k
g)1996
1997
1998
1999
H E BD CH GV HE LM MA SA SH SM
Maternal Effects
-9
-6
-3
0
3
6
9
12
Predictor variable
Est
imat
e (k
g)
1996
1997
1998
1999
H E BD CH GV HE LM MA SA SH SM
Figure 4.9. Estimates of direct and maternal dominance (H), epistatic loss (E), and breed
additive effects (as deviations from AN), under ordinary least squares, using records from
1986 to the indicated year
122
Direct Effects
-10
-5
0
5
10
15
Predictor variable
Est
imat
e (k
g)1996
1997
1998
1999
H E BD CH GV HE LM MA SA SH SM
Maternal Effects
-6
-3
0
3
6
9
12
Predictor variable
Est
imat
e (k
g)
1996
1997
1998
1999
H E BD CH GV HE LM MA SA SH SM
Figure 4.10. Estimates of direct and maternal dominance (H), epistatic loss (E), and
breed additive effects (as deviations from AN), under ridge regression method R1, using
records from 1986 to the indicated year (ridge parameter K was obtained using records
from 1986 to 1996)
123
Direct Effects
-6
-4
-2
0
2
4
6
8
10
Predictor variable
Est
imat
e (k
g)1996
1997
1998
1999
H E BD CH GV HE LM MA SA SH SM
Maternal Effects
-6
-3
0
3
6
9
12
Predictor variable
Est
imat
e (k
g)
1996
1997
1998
1999
H E BD CH GV HE LM MA SA SH SM
Figure 4.11. Estimates of direct and maternal dominance (H), epistatic loss (E), and breed
additive effects (as deviations from AN), under ridge regression method R2, using records
from 1986 to the indicated year (ridge parameter K was obtained using records from 1986
to 1996)
124
Chapter 5
General Discussion
In the previous chapters different problems relating to statistical methods applied to
estimation of breeding values for animals in a multi-breed population of beef cattle were
investigated. In Chapter 2, methods for measuring the degree of connectedness among
test groups of centrally tested beef bulls were assessed and compared. Models to predict
PEVD, which could be routinely used in genetic evaluation programs, were defined.
Chapter 3 was concerned with estimation of variance components, multi-breed additive
genetic changes, and direct and maternal breed additive, dominance, and epistatic loss
effects on pre-weaning gain. Chapter 4 dealt with estimation of genetic effects in the
presence of multicollinearity. Emphasis on acquiring stable estimates of direct and
maternal breed additive, dominance, and epistatic loss effects was made, which could
contribute to more accurate and consistent multi-breed genetic evaluation of beef cattle.
125
Degree of connectedness among test groups of
centrally tested beef bulls
The degree of connectedness among test groups is likely a limiting factor for effective
selection across test groups. With a lower degree of connectedness between test groups,
comparison of animals’ EBV from different groups is less accurate and can result in
incorrect ranking of animals across test groups. Kennedy and Trus (1993) suggested that
PEVD should be the basis for measuring connectedness. This statistic, however, is
computationally excessive and very difficult to apply in routine large-scale genetic
evaluation. Various criteria have been proposed for measuring connectedness, but most
are not feasible for implementation in very-large scale genetic evaluation. In Chapter 2,
three alternative methods (VED, CR, and GLT) were studied and used in models to
predict PEVD using weights of bulls tested in central evaluation stations in Ontario,
Canada, from 1988 to 2000. The degree of connectedness was calculated for pairs of test
groups and for each test group with all other test groups. Connectedness between pairs of
test groups indicates the level of accuracy in comparing EBV of animals from two test
groups, whereas average connectedness of each test group with all others indicates the
level of average accuracy in comparing EBV of an animal with animals in all other test
groups, allowing effective selection across all test groups.
Results presented in Chapter 2 indicate that the average PEVD of pairs of test groups
can be more accurately predicted on the basis of the model that includes GLT than on the
basis of models that include either VED or CR. Average PEVD of each test group with all
126
other test groups can be more accurately predicted on the basis of models that include
either CR or GLT. Because GLT is computationally less demanding than CR, it could be
easily and routinely calculated.
The GLT method used for measuring the degree of connectedness between test
groups, in its original form, considers only the direct genetic links due to common sires
and dams. The animal model used in the estimation of PEVD accounts for a larger
number of relationships than those due to common sires and dams. The fact that GLT and
PEVD were highly associated and that 94.5% of the total number of genetic links
between test groups was due to the use of common sires indicates that, in terms of
connectedness, common sires accounted for the most important relationships. Hanocq and
Boichard (1999) reported similar observations for the French national evaluation of
Holsteins. In their study, many of the additional relationships beyond those due to
common sires were within herds and, therefore, did not contribute to increase the
accuracy of comparisons among herds. To consider other relationships besides those due
to common sires and dams in the calculation of GLT, the extra computational cost versus
the increase in the accuracy of PEVD must be evaluated.
All measures of connectedness studied showed a decrease in the degree of
connectedness among test groups after 1994. Thus, the current trend in the accuracy of
comparisons of bulls tested in different test groups in Ontario is not favourable, even
though a requirement of a minimum of 12 bulls and 4 sires per test group has been
observed when determining the test groups.
For modifying the current trend with regard to connectedness and increase the
accuracy of comparisons, recommendations must be developed. These recommendations
should include the following ideas. (1) Artificial insemination plays an important rule
127
because it allows the distribution of progeny of sires across herds and, by extension,
across test groups. The planned use of common sires with high genetic values across
herds can increase connectedness among test groups, along with promoting genetic
improvement of the whole population. (2) The GLT, which was highly associated with
PEVD and has a practical advantage of being not excessively computational, could be
rapidly determined through simulation when test groups are proposed and decisions could
be made to increase the number of genetic links among test groups, allowing accurate
comparison of EBV across test groups. When forming the test groups, the genetic
relationships among bulls within and across test groups have opposite effects, as shown
by Kennedy and Trus (1993). The degree of connectedness increases with relationships
across test groups, whereas it decreases when the within-group relationship increases.
Practical implications
The common beef cattle practices of selection based on estimated breeding values
using Best Linear Unbiased Prediction allows comparison of animals tested in different
environments (e.g., test groups), provided that environmental units are genetically
connected. A high degree of connectedness is associated with a high accuracy of
comparison of animals’ EBV tested in different environmental units, enabling increased
rates of genetic gain. There was no well-established procedure for measuring the degree
of connectedness among groups of station tested beef bulls. In this study, different
methods for measuring the degree of connectedness were compared. The GLT method,
which is based on the total number of direct genetic links due to common sires and dams
between test groups, was suggested for measuring the degree of connectedness among
groups of centrally tested beef bulls with the aim of improving the accuracy of
128
comparison of bulls’ EBV across test groups. This method is not computationally
demanding, enables differentiation between completely disconnected test groups from
connected ones, and is highly correlated with PEV of comparison between groups of
bulls.
Limitations and suggestions for further investigations
In this investigation, the methods were evaluated under a univariate animal model.
Given that multiple trait models are widely used in the beef industry, investigations of
connectedness using multiple trait models, particularly when animals are not observed for
all traits and/or traits have different models, would merit further examination. Studies
including a simulation to evaluate the degree of dependence of the alternative methods on
the particular structure of the data are warranted.
129
Additive, dominance, and epistatic loss on pre-
weaning gain in crossing of different Bos taurus
breeds
Genetic evaluation involving purebred and crossbred animals from a large number of
breeds have been ongoing in Ontario for many years (Miller et al., 1995). One of the main
reasons for a multi-breed genetic evaluation is the possibility of comparing animals of
various breeds and breed constitutions in the pooled dataset, enabling effective use of the
genetic variability that exists in the whole population. Estimates of variance components,
heterosis, breed effects, and breed additive genetic changes have been previously
obtained in Ontario (Miller 1996; Sullivan et al., 1999), but there were no available
estimates of separate direct and maternal dominance and epistatic loss effects associated
with breed heterozygosities. The main objective of Chapter 3 was to obtain estimates of
variance components, breed additive genetic changes, and breed additive, dominance, and
epistatic loss effects on pre-weaning gain in Ontario.
The database available to develop this study included data from approximately 60
different breeds, as well as crossbreds. Many of these breeds, however, were represented
by a small number of animals. Estimating all effects included in the genetic model for
those breeds with few records could be inaccurate. For this reason, a subset of the 10 most
popular breeds, including Angus, Blond D’Aquitane, Charolais, Gelbvieh, Hereford,
Limousin, Maine-Anjou, Salers, Shorthorn, and Simmental, was chosen for the analysis.
130
The GLT method, which was described in Chapter 2, was used to identify a subset of
genetically connected contemporary groups across breeds to be used in the analysis. This
procedure was intended to minimize possible confounding between contemporary group
and genetic effects.
Estimates of variance components obtained in Chapter 3 did not greatly differ from
previous studies in Ontario. Expressed as proportions of the phenotypic variance, direct
additive genetic, maternal additive genetic, maternal permanent environment, and residual
variances were equal to 0.32, 0.20, 0.12, and 0.52, respectively. A strong genetic
correlation of –0.63 between direct and maternal effects was found. This correlation
seemed to be more likely a consequence of lack of enough information in the dataset to
separate partially confounded effects than an indication of a true antagonistic relationship.
Annual breed additive genetic changes obtained for Angus, Charolais, Hereford,
Limousin, and Simmental, using two different approaches, indicated positive annual
additive genetic changes. The traditional approach based on yearly average breeding
values of purebred calves and the alternative approach based on regression of yearly
estimated breeding values on contribution of each breed to the breed composition of the
calves revealed differences in selection practices among breeds. Producers used animals
of substantially higher additive genetic values to produce purebred Charolais, Hereford,
and Simmental than to produce crossbred animals. Producers of Angus and Limousin
used animals of similar genetic values to produce both purebred and crossbred animals.
Direct and maternal dominance effects caused a favourable effect equivalent to
1.31% and 2.28% of the phenotypic average. Direct and maternal epistatic loss effects
were equivalent to –2.19% and –0.08%, respectively, but the maternal epistatic loss effect
was statistically not different from zero (P > 0.05). To detect significant effects, a larger
131
variety of crossbred sires in the dataset might be required. Standard errors of maternal
dominance and direct epistatic loss effects were large, in comparison to standard errors of
direct dominance and maternal epistatic loss effects. Additional analysis excluding
epistatic loss effects resulted in estimates of direct and maternal dominance of 1.31% and
1.84%, respectively. Therefore, estimates of dominance from both models did not differ
greatly. The estimates of direct and maternal dominance effects obtained in this
investigation were substantially lower than the heterosis of 5% assumed in the genetic
evaluation procedures for pre-weaning gain currently used in Ontario.
Estimates of breed additive effects were in general agreement with expectations
based on previous studies in Ontario. Standard errors of the estimates of breed effects and
sampling correlations between estimates of direct and maternal breed effects were high.
Practical implications
Estimates of variance components obtained in this study were in line with previous
studies in Ontario. Because the strong negative genetic correlation between direct and
maternal genetic effects was more likely due to lack of enough information in the dataset
to accurately estimate these two partially confounded effects, it seems reasonable to
assume a zero genetic correlation between direct and maternal effects in the genetic
evaluation until an alternative parameter is verified with further research.
In the multi-breed genetic evaluation currently run in Ontario, pre-weaning gain
records are pre-adjusted for direct and maternal heterosis assuming a level of 5% for both,
based on average values from literature. Given the accumulated evidence that the level of
direct and maternal heterosis on pre-weaning gain is lower than 5%, this level should be
reviewed in the multi-breed genetic evaluation in this population.
132
Large standard errors of the estimates of breed effects and high sampling correlations
between estimates of direct and maternal breed effects can be a symptom of a lack of
sufficient information to estimate both direct and maternal breed effects and/or
multicollinearity among predictor variables of breed effects. Because estimates of breed
effects comprise part of the across-breed estimated breeding values (across-breed
comparisons or ABC) used as selection criterion across breeds, problems with accurate
estimation of breed effects may result in unreliable ranking of the animals.
Limitations and suggestions for further investigations
Direct and maternal epistatic loss effects were estimated using average
heterozygosities of the parents of an individual (ED) and its mother (EM) as predictor
variables (Fries et al., 2000). Coefficients ED and EM are easily determined because they
are simple functions of the heterozygosities of the parents. In addition, they allow for
differentiating the amount of epistatic loss in the F2 from the amount in the F3 and further
advanced generations. ED and EM have a relative probabilistic interpretation, but they are
not directly biologically interpretable. The average epistatic loss due to the breakdown of
all kinds of gene interactions, as a deviation from the average additive and dominance
effects, are estimated by ED and EM (Fries et al., 2002). A drawback of this approach is
that recombination between uniting gametes is partially confounded with dominance
effects, as occurs in the definition of recombination loss (Dickerson, 1973). The
drawback comes from the fact that interactions between genes (in the same locus or in
different loci) are taken as a dominance effect, while interactions between genes
occurring one generation back, at the gamete level, are taken as recombination or epistatic
133
loss. Further investigations to compare ED and EM with different predictor variables for
epistatic loss are warranted.
The breeds included in this study were from distinct biological types. An adequate
decomposition of dominance and epistatic loss effects should consider possible specific
combining ability between pairs of breeds or biological types. However, due to data
structure limitations involving some breeds, the same dominance and epistatic loss effects
were assumed for crosses of different pairs of breeds.
The database available to develop this study includes data from approximately 60
different pure breeds, as well as crossbreds, but only breeds represented by a large
number of animals were considered in the analysis. A question that arises is how to
evaluate animals from breeds represented by a small number of animals in the dataset. An
investigation to evaluate possible alternatives is warranted.
Further work is needed to investigate the nature of the strong antagonistic genetic
correlation between direct and maternal genetic effects. A simulation study could be used
to determine the data structure required to generate accurate genetic correlations between
direct and maternal effects.
Estimates of breed additive effects are included in the across-breed estimated
breeding value used as selection criterion in multi-breed beef cattle. Because estimates of
breed effects were highly unstable, showing high standard errors, an investigation to
detect the causes of instability and application of alternative statistical methods was
conducted in Chapter 4.
134
Estimation of genetic effects in the presence of
multicollinearity
In Chapter 4 a framework for obtaining stable estimates of breed additive,
dominance, and epistatic loss effects on pre-weaning gain when multicollinearity among
predictor variables is of concern was developed. The framework was constructed by
firstly identifying the predictor variables involved in multicollinearity, and, secondly,
applying ridge regression methods in the estimation of direct and maternal breed additive,
dominance, and epistatic loss effects. The genetic model used in the analysis accounted
for all known genetic relationships among animals through the additive relationship
matrix, which was possible because all animals had complete pedigree information. The
application of such a complete model is generally not possible for field datasets,
particularly when multiple sire mating groups are used.
Multicollinearity diagnostics performed in Chapter 4 indicated that predictor
variables of direct and maternal breed additive effects were the main candidates for linear
dependencies, followed by predictor variables of maternal dominance and direct epistatic
loss effects. Mathematical constraints among predictor variables and the small proportion
of crossbred sires in the dataset seemed to be the main causes of multicollinearity.
The choice of an adequate ridge parameter is one of the most important tasks in a
ridge regression analysis. Two ridge regression methods were used to determine the ridge
parameter: R1 was the generalized ridge estimator of Hoerl and Kennard (1970a) and R2
was based on bootstrap and cross-validation (Delaney and Chatterjee, 1986), extended to
135
obtain diagonal elements of the ridge parameter matrix K proportional to the variance
inflation factor of each regression coefficient under LS. With this procedure, unnecessary
bias is not imposed on those predictor variables not seriously involved in
multicollinearity.
Ridge parameters determined by both ridge regression methods resulted in a set of
estimates of regression coefficients with smaller MSEP and lower VIF than LS. Average
MSEP under both ridge regression methods, obtained over one hundred bootstrap
samples, were 3% lower than in the LS. Average VIF given by ridge regression methods
R1 and R2 were 77% and 84% lower than in the LS, respectively. A model that results in
lower VIF and smaller MSEP is desirable because these statistics indicate stability in the
estimates (lower standard errors) and the ability of the model to predict future
observations.
Due to multicollinearity among predictor variables of both direct and maternal breed
additive effects, most regression coefficients of breed effects given by LS showed large
standard errors and were highly confounded. The use of ridge regression methods tended
to alleviate these symptoms of multicollinearity among breed effects. Estimates of breed
effects given by ridge regression were more stable (had lower standard errors) and
showed a lower degree of confounding than estimates given by LS. These desirable
properties of estimates obtained by ridge regression methods will increase the probability
of properly ranking the breeds, which will ultimately result in a suitable ranking of the
animals in the across breed comparisons.
Estimates of maternal dominance and direct epistatic loss effects had large standard
errors and were highly confounded under LS. The use of ridge regression methods only
slightly reduced the standard errors and the degree of confounding between these
136
estimates, suggesting that the small proportion of crossbred sires in the dataset,
aggravated by linear dependencies between corresponding predictor variables, did not
comprise enough information to effectively separate maternal dominance and direct
epistatic loss effects, regardless of the fact that both effects were statistically significant.
As a consequence, the inclusion of epistatic loss effects in the genetic model did not cause
noticeable re-ranking of the animals in the across breed comparison.
Practical implications
A problem that has received little attention is the assessment of causes,
consequences, and development of methods to minimize the negative consequences of
multicollinearity in genetic evaluation of multi-breed populations. In this study, a
framework using ridge regression methods was developed to deal with multicollinearity
problems in the genetic evaluation of multi-breed populations.
With an adequate choice of the ridge parameter, ridge regression methods resulted in
lower VIF and smaller MSEP than LS. Estimates obtained by ridge regression were more
stable (lower standard errors) and could be used with advantage over LS for prediction
purposes. Ridge regression methods were particularly effective to alleviate the symptoms
of multicollinearity caused by linear dependencies among predictor variables of breed
additive effects. Besides lower standard errors, estimates of breed effects under ridge
regression methods showed lower degree of confounding in comparison to LS. Thus, the
use of the estimates obtained by ridge regression methods can increase the probability of
properly ranking the animals in the across breed comparisons. Meanwhile, more
consistent across-breed estimated breeding values in successive genetic evaluations can
be expected.
137
The few variety of crosses due to a small proportion of crossbred sires in the dataset,
in addition to the linear dependencies among predictor variables, provided little
opportunity for obtaining accurate estimates of separated effects of dominance and
epistatic loss effects. Due to the high degree of confounding between estimates of
maternal dominance and epistatic loss effects, it was not possible to compare the relative
importance of these components of the heterosis with a high level of confidence. The
inclusion of epistatic loss effects in the standard additive-dominance model used in the
multi-breed genetic evaluation does not cause appreciable re-ranking of animals on the
basis of across-breed estimated breeding values.
Limitations and suggestions for further investigations
The variance of a particular breed additive effect estimate depends on the number of
animals from the particular breed in the analysis and the degree of multicollinearity of the
corresponding predictor variable as well. The framework developed herein, based on
ridge regression methods, offers an alternative way to account for multicollinearity. A
limitation with ridge regression is that it ignores the number of animals in each breed
when shrinking the estimates to break the dependencies. In this investigation, breeds with
larger number of records were more associated with multicollinearity. As a consequence,
estimates for breeds with a larger number of animals were shrunk to a higher degree than
breeds with a smaller number of records, even though the former group of breeds showed
lower standard errors under LS. The reduction in breed difference estimates resulting
from ridge regression methods, although more stable, should be further evaluated.
Investigation of other evaluated traits and comparison to other alternative methods to deal
with multicollinearity, such as that of treating genetic groups as random, is warranted.
138
References
AAFC. 1993. National standards document for the genetic improvement of Canadian
beef cattle. Livestock Development Division, Agriculture Development Branch,
Agriculture Canada, Ottawa, Ontario.
Arthur, P. F., Hearshaw H. and Stephenson, P. D. 1999. Direct and maternal additive
and heterosis effect from crossing Bos indicus and Bos taurus cattle: cow and calf
performance in two environments. Livest. Prod. Sci. 57: 231-241.
Belsley, D. A. 1991. Conditioning diagnostics, collinearity and weak data in regression.
1st ed. John Wiley and Sons, Inc., New York. 396pp.
Belsley, D. A., Kuh, E. and Welsch, R. E. 1980. Regression diagnostics. 1st ed. John
Wiley and Sons, Inc., New York. 320pp.
Cardoso, V. 2004. Direcionando acasalamentos para maximizar a média do valor
genotípico de uma futura safra. Ph.D. Thesis, Universidade Estadual Paulista,
Faculdade de Ciências Agrárias e Veterinárias, Campus de Jaboticabal, Brazil, 101p.
Cassady, J. P., Young, L. D. and Leymaster, K. A. 2002. Heterosis and recombination
effects on pig growth and carcass traits. J. Anim. Sci. 80: 2286-2302.
Chatterjee, S., Hadi, A. S. and Price, B. 2000. Regression analysis by example. 3rd ed.
John Wiley and Sons, Inc., New York. 359pp.
Cunningham, E. P. 1987. Crossbreeding - The Greek Temple Model. J. Anim. Breed.
Genet. 104: 2-11.
Cunningham, E. P. and Connolly, J. 1989. Efficient design of crossbreeding
experiments. Theor. Appl. Genet. 78: 381-386.
139
Delaney, N. J. and Chatterjee, S. 1986. Use of the bootstrap and cross-validation in
ridge regression. J. Bus Econ. Statist. 4: 255-262.
Demeke, S., Neser, F. W. C. and Schoeman, S. J. 2003. Early growth performance of
Bos Taurus x Bos Indicus cattle crosses in Ethiopia: Evaluation of different
crossbreeding models. J. Anim. Breed. Genet. 120: 39-50.
Dickerson, G. E. 1969. Experimental approaches in utilising breed resources. Anim.
Breed. Abstr. 37: 191-202.
Dickerson, G. E. 1973. Inbreeding and heterosis in animals. Proc. of the Animal
Breeding and Breeding Genetics Symp. in Honor of Dr. J. L. Lush. pp. 54-77. ASAS
Champaign, IL.
Draper, N. R. and Smith, H. 1998. Applied regression analysis. 3rd ed. John Wiley and
Sons, Inc., New York. 706pp.
Efron, B. 1979. Bootstrap methods: Another look at the Jackknife. Ann. Stat. 7: 1-26.
Elzo, M. A., Jara, A. and Barria, N. 2004. Genetic parameters and trends in the Chilean
multibreed dairy cattle population. J. Dairy Sci. 87: 1506-1518.
Foulley, J.L., Hanocq, E. and Boichard, D. 1992. A criterion for measuring the degree
of connectedness in linear models of genetic evaluation. Gen. Sel. Evol. 24: 315-330.
Freund, R. and Littell, R. C. 2000. SAS® System for Regression. 3rd ed. Carry, NC:
SAS Institute Inc. 236pp.
Fries, L.A. 1998. Connectability in beef cattle genetic evaluation: the heuristic approach
used in MILC.FOR. Proc. 6th World Cong. Genet. Appl. Livest. Prod., Armidale,
NSW, Australia. 27: 449-500.
140
Fries, L. A., Johnston, D. J., Hearnshaw, H. and Graser, H. U. 2000. Evidence of
epistatic effects on weaning weight in crossbreed beef cattle. Asian-Aust. J. Anim.
Sci. 13(Suppl. B): 242.
Fries, L.A., Schenkel, F. S., Roso, V. M., Brito, F. V., Severo, J. L. P. and Piccoli, M.
L. 2002. “Epistazygosity” and epistatic effects. Proc. 7th World Cong. Genet. Appl.
Livest. Prod., Montpelier, France. Communication No 17-15.
Goldstein, M. and Smith, A. F. M. 1974. Ridge-type estimators for regression analysis.
J. Roy Statist. Soc. 36: 284-291.
Gregory, K. E., Cundiff, L. V. and Koch, R. M. 1997. Composite breeds to use
heterosis and breed differences to improve efficiency of beef production. Roman L.
Hruska U. S. MARC, Clay Center, NE.
Groeneveld, E. 1990. PEST User’s Manual. Institute of Animal Husbandry and Animal
Behaviour, Federal Agricultural Research Centre, Germany.
Gruber, M. H. J. 1998. Improving efficiency by shrinkage: the James-Stein and ridge
regression estimators. 1st ed. Marcel Dekker, New York. 632pp.
Hanocq, E. and Boichard, D. 1999. Connectedness in the French Holstein cattle
population. Gen. Sel. Evol. 31: 163-176.
Hébel, P., Faivre, R., Goffinet, B. and Wallach, D. 1993. Shrinkage estimators applied
to prediction of French winter wheat yield. Biometrics. 49: 281-293.
Hoerl, A. E. 1962. Application of ridge analysis to regression problems. Eng. Progress.
58: 54-59.
Hoerl, A. E. and Kennard, R. W. 1970a. Ridge regression: Biased estimation for
nonorthogonal problems. Technometrics. 12: 55-67.
141
Hoerl, A. E. and Kennard, R. W. 1970b. Ridge regression: Application to
nonorthogonal problems. Technometrics. 12: 69-82.
Johnston, D. J., Tier, B., Graser, H. and Girard, C. 1999. Presenting BREEDPLAN
Version 4.1. Proc. Assoc. Advtm. Animal Breed. Genet. 13: 193-196.
Kennedy, B. W. and Trus, D. 1993. Considerations on genetic connectedness between
management units under an animal model. J. Anim. Sci. 71: 2341-2352.
Kinghorn, B. 1983. Genetic effects in crossbreeding. III. Epistatic loss in crossbred mice.
Z. Tierzüchtg. Züchtgsbiol. 100: 209-222.
Kinghorn, B. P. and Vercoe, P. E. 1989. The effects of using the wrong genetic model
to predict the merit of crossbred genotypes. Anim. Prod. 49: 209-216.
Klei, L., Quaas, R. L., Pollak, E. J. and Cunnigham, B. E. 2002. Multiple-breed
evaluation. Available: http://www.abc.cornell.edu.tmprols/doc1.pdf. Accessed May
12, 2004.
Koch, R. M., Dickerson, G. E., Cundiff, L. V. and Gregory, K. E. 1985. Heterosis
retained in advanced generations of crosses among Angus and Hereford cattle. J.
Anim. Sci. 60: 1117-1132.
Koots, K., Gibson, J. P. and Wilton, J. W. 1994a. Analysis of published parameter
estimates for beef production traits. 1. Heritability. Anim. Breed. Abstr. 62: 309-337.
Koots, K., Gibson, J. P. and Wilton, J. W. 1994b. Analysis of published parameter
estimates for beef production traits. 1. Phenotypic and genetic correlations. Anim.
Breed. Abstr. 62: 825-853.
Laloë, D. 1993. Precision and information in linear models of genetic evaluation. Gen.
Sel. Evol. 25: 557-576.
142
Long, C. R. 1980. Crossbreeding for beef productions: experimental results. J. Anim. Sci.
51: 1197-1223.
Lowerre, J. M. 1974. On the mean square error of parameter estimates for some biased
estimators. Technometrics. 16: 461-464.
Madsen, P. and Jensen, J. 2000. DMU – A package for analysing multivariate mixed
models. Danish Institute of Agricultural Sciences (DIAS), Denmark.
Marquardt, D. W. and Snee, R. D. 1975. Ridge regression in practice. Amer. Statist. 29:
3-20.
Mathur, P. K., Sullivan, B. P. and Chesnais, J. P. 1999. Estimation of the degree of
connectedness between herds or management groups in the Canadian swine
population. Canadian Centre for Swine Improvement, Ottawa, Canada. (Mimeo).
Mathur, P. K., Sullivan, B. P. and Chesnais, J. P. 2002. Measuring connectedness:
concept and application to a large industry program. Proc. 7th World Cong. Genet.
Appl. Livest. Prod., Montpelier, France. Communication No 20-13.
Meyer, K. 1989. Approximate accuracy of genetic evaluation under an animal model.
Livest. Prod. Sci. 21: 87-100.
Meyer, K. 1992. Variance components due to direct and maternal effects for growth traits
of Australian beef cattle. Livest. Prod. Sci. 31: 179-204.
Miller, S. P. 1996. Studies on genetic evaluation and the effect of milk yield on profit
potential in a multi-breed beef cattle population. Ph.D. Thesis, University of Guelph,
Canada. 217p.
Miller, S. P., Wilton, J. W. and Griffiths, S. J. 1995. Utilizing multi-breed genetic
evaluations in beef cattle breeding. Proc. Aust. Assoc. Anim. Breed. Genetic. 11: 254.
143
Misztal, I. and Wiggans, G. R. 1988. Approximation of prediction error variance in
large-scale animal models. J. Dairy Sci. 71(Suppl. 2): 27(Abstr.).
Piccoli, M. L., Roso, V. M., Brito, F. V., Severo, J. L. P., Schenkel, F. S. and Fries, L.
A. 2002. Additive, complementarity (additive x additive), dominance and epistatic
effects on pre-weaning gain of Hereford x Nelore calves. Proc. 7th World Cong.
Genet. Appl. Livest. Prod., Montpelier, France. Communication No 17-16.
Pimentel, E. C. G., Queiroz, S.A., Carvalheiro, R. and Fries, L. A. 2003. Efeitos da
inclusão de epistasia e complementariedade em modelos de avaliação genetica em
bovinos de corte. In.: Reunião Anual da Sociedade Brasileira de Zootectia, 40. Santa
Maria-RS, Brazil.
Pimentel, E. C. G., Cardoso, V., Carvalheiro, R., Queiroz, S. A. and Fries, L. A.
2004. Predições de desempenho de gerações avançadas conforme diferentes modelos
de avaliação de animais cruzados. In.: Reunião Anual da Sociedade Brasileira de
Zootecnia, 41. Campo Grande-MS, Brazil.
Pollak, E. J. and Quaas, R. L. 1998. Multibreed genetic evaluations of beef cattle.
Proc. 6th World Cong. Genet. Appl. Livest. Prod., Armidale, NSW, Australia. 23: 81-
88.
Robinson, D. L. 1996. Estimation and interpretation of direct and maternal genetic
parameters for weights of Australian Angus cattle. Livest. Prod. Sci. 45: 1-11.
Rodríguez-Almeida, F. A., Van Vleck, L. D. and Gregory, K. E. 1997. Estimation of
direct and maternal breed effects for prediction of expected progeny differences for
birth and weaning weights in three multibreed populations. J. Anim. Sci. 75: 1203-
1212.
144
Roso, V. M. and Fries, L. A. 1998. Maternal and individual heterozygosities and
heterosis on pre-weaning gain of Angus x Nelore calves. Proc. 6th World Cong.
Genet. Appl. Livest. Prod., Armidale, Australia.
Roso, V. M., Schenkel, F. S. and Miller, S. P. 2004. Degree of connectedness among
groups of centrally tested beef bulls. Can. J. Anim. Sci. 84: 37-47.
SAS. 1990. SAS/STAT User’s Guide (Version 6). SAS Inst. Inc., Cary, NC.
Sheridan, A. K. 1981. Crossbreeding and heterosis. Anim. Breed. Abst. 49: 131-144.
Smith, C. 1984. Rates of genetic change in farm livestock. Res. Develop. Agric. 1: 79-85.
Sorensen, D. and Gianola, D. 2002. Likelihood, bayesian, and MCMC methods in
quantitative genetics. Springer-Verlang New York, Inc. New York. 740pp.
Sullivan, P. G., Wilton, J. W., Miller, S. P. and Banks, L. R. 1999. Genetic trends and
breed overlap derived from multiple-breed genetic evaluation of beef cattle for growth
traits. J. Anim. Sci. 77: 2019-2027.
Weisberg, S. 1985. Applied linear regression. 2nd ed. John Wiley and Sons, Inc., New
York. 324pp.
Wood, C. M., Christian, L. L. and Rothschild, M. F. 1991. Evaluation of performance-
tested boars using single-trait animal model. J. Anim. Sci. 69: 3144-3155.