Computing algorithm for dairy sire evaluation
on
several lactations considered
asthe
sametrait
B. BONAITI Michèle BRIEND 1.N.R.A., Station de
Génétique quantitative
etappliqu g e,
Centre de Recherches
Zootechniques,
F 78350Jouy-en-Josas
Summary
A
computing algorithm
issuggested
fordairy
sire evaluation on several lactations consideredas the same trait when the model must include
herd-year (HY),
cow and sire as well as otherenvironmental effects that HY
(ENV).
Afterdescription
ofequations leading
to estimates of thedifferent effects and of available
computing
methods, someimprovements
areproposed : 1)
Amethod for cow
equations absorption
is described.2)
Instead ofabsorption
of HYequations
whichis
highly
timeconsuming, computing
of HY, ENV and sire effectsby
a block iterativeprocedure,
is
suggested. 3) Expressing
all the former records as deviations fromprevious
HY and ENV estimates, isproposed
to combine former and recent data sets for sire evaluation withoutincreasing
too much thecomputing length.
Key
words :Breeding
value, BLUP,dairy
cattle,computing algorithm.
Résumé
Algorithme
de calcul pour l’évaluation de la valeurgénétique
des taureaux laitierssur
plusieurs
lactations considérées comme un seul caractèreUne méthode de calcul est
proposée
pour l’indexation des taureaux laitiers surplusieurs
lactations, considérées comme un seul caractère,quand
le modèled’analyse
doit tenir compte des effetstroupeau-année (HY),
d’environnement autres que HY(ENV),
vache etpère. Après
uneprésentation
deséquations
conduisant aux estimations des différents effets et des méthodes de résolution,quelques
améliorations sontproposées : 1)
Une méthode est décrite pourl’absorption
des
équations
vache.2)
Au lieu de recourir àl’absorption
deséquations
HY,qui.
serait troplongue,
il estpossible
d’obtenir les solutionscorrespondantes
aux effets HY, ENV etpère
par uneprocédure
itérative.3)
Enexprimant
lesperformances
antérieures en écart aux effets HY et ENV, à l’aide des solutions obtenues lors des calculs antérieur3, on propose de combiner les données anciennes et récentes pour l’estimation de la valeurgénétique
des taureaux sans trop augmenter lacomplexité
des calculs.Mots clés : Valeur
génétique,
BLUP, bovins laitiers,algorithme
de calcul.I. Introduction
The theoretical
principles
for estimation ofbreeding
values were establishedby
Lus
H
(1931)
and laterperfected by
HENDERSON(1973)
with the Best Linear Unbiased Predictor(BLUP). Computations providing
BLUP estimates are very similar to those of least squares, and manyapplications
havealready
been made in differentspecies
andfor different characters. For
large
data sets, andespecially
foranalysis
withcomplicated models, computations
can be very timeconsuming.
InFrance,
analgorithm
such asproposed by
UFFORD et al.(1978)
fordairy
sire evaluation with several lactations has not been used for 2 main reasons.First,
the model had to include other environmental factors than those of herd effects.Second, including
thecomplete
data set in eachanalysis,
asrequired
for severallactations,
would have led to excessivecomputing.
ForFrench
dairy
sireevaluation,
PouTous et al.(1981)
use an easier method based on data from the last three yearsonly.
This method enables to handle alarge
model because it does notrequire setting
up of the coefficient matrix. Its 2 main features are an estimate of each effect obtained from aregressed
mean deviation of data corrected for the other effectsby
estimates fromprevious analysis
and a « selection »factor,
at each level of which cows are rankedaccording
to their first lactationdeviation,
toprevent
cow effects in the model.An alternative to this
procedure currently applied
inFrance,
isproposed
in thispaper. The BLUP
principles
are maintained but some of theapproximations
of theFrench
dairy
sire evaluation method areadopted.
II. BLUP
equations
Four main sources of variation are
usually
considered in theanalysis
ofdairy
fieldrecords :
- the
sire,
and in some cases, the maternalgrandsire,
- the
herd-year-season
orherd-year
effects(HY),
- the cow, if several lactations are considered for the same cow,
- and a set of other factors called
ENV,
related to theenvironment,
butindependant
of HY. These factors can be month ofcalving,
age andparity. Usually, they
do not appear in the model ofanalysis
as the data can be corrected for these factorsprior
to theanalysis. However,
inFrance, they
have been included in the model from the onset ofdairy
sire evaluation.The
following
linear model can then be chosen for theanalysis
of data and sire evaluationby
the BLUPprocedure :
where p, m,
h,
crepresent
vectors ofsire, ENV,
HY and cow within sireeffects, S, T,
R and Z the related
design
matrices. Vector Erepresents
random residual effects and is assumed to bemultinormally
distributed with covariance matrixV.oe.
Matrix V is assumed to bediagonal
and the elementcorresponding
to the 1‘&dquo; record of the kllcowis :
Thus, complete
andincomplete
records may begiven
differentweights (w!) according
to lactationlength
as in Frenchdairy
sire evaluation(P OUTOUS
etal., 1981).
Sire
(p)
and cow(c)
effects are also assumed to be random effects withexpected
zerovalue. If
ay
and r are variance andrepeatability
ofrecords,
if A is the numeratorrelationship
of the sires and if the sire variance is 1/4 additivegenetic variance,
wehave :
I
With
these assumptions, the sire evaluation according
to the BLUP methodology,
is derived
from :
U
FFORD et al.
(1978)
and SCHAEFFER(1975)
described efficient methods to be used when the ENV effects are not in the model and when cow effects can be considered to be within herd nested. The 2 mainsteps
are :-
absorption
of cow and then HYSequations,
- solution of the
resulting equations by
an iterativeprocedure.
III.
Adaptation
to model withsire, herd-year
and other environmental effects(ENV)
The 2 successiveabsorptions
of cow andherd-year equations
are more difficultwhen the ENV effects are considered in addition to sire effects. On the one
hand,
theresulting equations
is toolarge
to be set up within corestorage.
If each element must be stored onperipheral storage equipment
and accumulatedlater,
then the number of these elements is toolarge.
Inaddition,
cow effects are not nested within all the other effects. Someadaptations
can then make the sire evaluation easier.The set of
equation (I)
can be written :and
A f
is a blockdiagonal matrix,
with the same dimensions asU’V-’U,
in which the upper block relative to sire effect(p)
is kA-’ and the others are zero matrices.Later on, f is
split
up into differentfactors !
so that each row of the relateddesign
matricesU.
hasonly
one non zeroelement,
which is alsoequal
to 1. Forexample, f_
mayrepresent months,
age or sire effect. If a level of any factorf_
isrelated to the u’&dquo; column of matrix
U,
sums ofweights (w,,)
relative to this level(u)
will be w! and wku’
respectively
for the whole data set and the kll cow. The relatedsums for a combination of 2 levels
(u
andu’)
will be w.., and wkuu’respectively.
A. Cow
equation absorption
absorption
of cowequations
leads to :In order to
get
elements p[u;u’]
and s[u],
it ispossible
to :o
compute
the 2quantities
a for each cow, cumulate them in p
[u ; u’]
and s[u].
But this method is efficient when values of Wku are
large
as in the case ofabsorption
ofherd-year equations.
For coweffects,
there aremostly
one record per cell definedby
the combination of levels u and k. In this case,adding separetely
for eachrecord a certain
quantity
to the related element of p or s withoutcomputing
thefigures,
wku’ wkuu’l Yk! for each cow, is more efficient. This ispossible
with analgorithm (derived
from resultsgiven
inappendix)
still reliable even if 2 or more records of thesame cow appear within the same level of any factor.
Using
the notation(khp)
toidentify
the row or the column in p or s related to the level of thefactor !
for the 1’&dquo; record of the kll cow, thealgorithm proposed here,
consists in the
following
additions for each cow : . for each record1,
add :wk!
(Y k]
-m k )
to s[(klcp)]
for all factors ’1’.( Wkl
- W 2 k l/( Wk
+a))
to p[(klcp) ; (klcp’)]
for all orderedpairs
of sub-factors(o, cp’)
with tpequal
or not too’.
. for each ordered
pairs
of records(I, I’)
with1 ! 1’,
add —( Wk] Wk /( Wk
+a))
top
[(klcp) ; (kl’cp’)]
for all the orderedpairs
of sub-factors( y , cp’) with
oequal
or. not to
’1 &dquo;.
B.
Absorption of
herd yearequations
or block iterativeprocedure
1.
Absorption procedure
As the
equation system (III)
remains toolarge,
anotherabsorption
ofherd-year equations
isusually suggested
to reduce the size.The
principle
of thisoperation
is thefollowing.
Let :A
.
be blockdiagonal matrix,
with the same dimensions as0,
in which the block relative to sire effect(p)
is kA-’ and the others zero.Equation
III can be written in another form :Absorption
ofherd-year equations
leads to :The cows are assumed to be nested within herds and matrix
Q,
is blockdiagonal.
If matrices
Q i , Q 2’ Q 3’
r,, r, aresplit
up,according
toherd,
intoQ,,, Q 2j’ () 3j , rj,
andr 2j respectively
in thefollowing
way :the two members of the
equation (V)
can be derived from :However,
thisabsorption
appears to behighly
timeconsuming mainly
because ofthe
expression Q 2j Q -1 0’!
for each herd. Forexample,
for a modelincluding
10 year effects and a vector g of 150levels, computing
needs 0.5 seconds per herd and therefore about 7hours,
for the 50 000 herds in the Frenchdairy recording
data set.For that reason this method cannot be
easily
used when alarge
model isapplied
to alarge
data set.2. Block iterative
procedure
Instead of an
absorption procedure,
one may use a block iterative method in which the solution of theequation
IV is derived at the n’&dquo; iteration from the solution of theprevious
iteration :The
following relationship
exists between two consecutive solutions of g :which is not very different form that
usually
used whenequation (V)
from theabsorption procedure
is solvedby
the Gauss-Seidel iterative method.But,
for 3reasons,
computations
of the solutions may be faster with the block iterative proce- dure :a) Computing
of0! Q3-’ Q’2
is not necessary with this method.b)
Matrix0,,
which is blockdiagonal,
may be invertedonly
once, at the firstiteration,
and then stored. This may also be the case of matrix(Q,
+Llg)
if the size of vector g is small or if therelationship
matrix A is not considered.c)
Theright
hand side coefficients can be written in another form :This may be
easily
obtained from theprevious algorithm
relative to cowabsorption
on a variable corrected for
g< n -’>
or h<&dquo;’.Therefore,
after the firstiteration, only
theright
hand side coefficients have to be recalculated.Computing length
of the block iterativeprocedure depends mainly
on the number of iterationsteps required
to reach anacceptable
solution. This is related to the convergence towards zero of :for which no
general
method of evaluation is available.3. Numerical
comparison
betweenabsorption
and block iterativeprocedure
According
to the French sireevaluation,
thespeed
of convergence of A(&dquo;)might
bevery
good.
Thus the 2procedures (absorption/block iterative)
werecompared
on arather
large
data set(300
000records) prepared
with the first 3 lactations of 3 Frenchdepartments
between 1976 and 1981. Data wereanalysed according
to the 2following
models
where
Y;!!km!y
milkproduction
inkg.
HY;! :
fixed effect of i‘&dquo; herd andj’&dquo;
year.YSPjkl :
fixed effect of phparity
and of kthcalving
season withinj’ h
year(3 x 4 x 5 levels).
YSM ik
.:
fixed effect of m’&dquo; month ofcalving
within klh season andj ll
year(3 x 4 x 5 levels).
V
1n
: fixed effect of n’&dquo; class of age orcalving
interval(for
lactation 2 and3)
within l’h
parity (10
x 3levels).
cic
: random effect of c’&dquo; cow within i’&dquo; herd withexpected
value zero andvariance
o<.
S,
: random effect of Slh sire withexpected
value zero andvariance . (4557 sires).
c
i.
: the same asCic
but within ith herd and s’h sire.Solutions relative to least squares
(model I)
or BLUP(model II) equations
wereobtained with the 2
methods, absorption
and block iterativeprocedures (tabl.
1 et2).
Block iterative estimates
rapidly approximated
thoseresulting
from theabsorption procedure.
With modelI,
the root mean square of the error(difference
between block iterative andabsorption solutions) quickly
decreases. At the fourth iteration the maximum error was less than 2kg
and the root mean square less than 1kg.
Convergence
of sire solutions was not asquick
with modelII, probably
because ofsome assocation between herds and bulls. After seven
iterations,
the root mean square of the error was 2.7kg,
and the maximum error 8.6kg. Comparison
ofcomputing
times for model I and II shows that the block iterative
procedure
was much moreefficient
(tabl. 4).
In
practice,
the values of most effectsbeing
well known before the firstiteration,
the number of iterations needed to
get good
solutions could be very small(2
or3).
This enhances the
advantage
of the block iterativeprocedure
because itscomputing requirements mostly depend
on the number of iterations whereas thecomputing
timefor the
absorption procedure depends
on theabsorption
itself.However,
use of theabsorption procedure
should not be excluded with model II. The fact that matrixQ,
isdiagonal disappears
if therelationship
matrix(A)
is used in theanalysis.
An association between herds and bullsmight require
such alarge
number of iterations that theabsorption procedure
may become more efficient in somepractical
conditions.The results from models I and II
prompted
us totry
a model IIIincluding
alleffects of both models I or II :
As the
absorption procedure
would have needed creation of toolarge
amatrix, only
the block iterativeprocedure
was used. At eachiteration,
theherd-year
effect(HY;!),
ENV effect( y s p jkll Ysm jk . 1 V 1n )
and sire effect weresuccessively computed.
Differences between successive solutions
give
some information about thespeed
ofconvergence towards exact solutions. Statistical
parameters
of these differences werecomputed separately
for each of the effects :YSP jk l’ Ysmjk., V kl
andS,.
At the 8’hiteration the root mean squares of the difference were smaller than 2
kg
of milk for all the effects.Particularly
the root mean square ofdifference
between the 7‘&dquo; and 8‘&dquo; sire effect solutions wasonly
1kg.
The maximum differences were less than 4kg
for theeffect
YSP!!&dquo; YSM jkm
andV,,
and 8kg
for the sire effect(tabl. 3).
As no
comparison
with exact solutions(obtained
from theabsorption procedure)
ispossible,
the value of the block iterativeprocedure
cannot beaccurately
established.However,
theonly
other solution would be :-
preparing
a set of ENV effect estimates from ananalysis
of data without sireeffect,
-
correcting
dataaccording
to these ENV effectestimates,
-
analysing
dataaccording
to model II.In
comparison,
our method described aboveprovides
after a few iterations asolution of the ENV effects
independent
of the sire effect. This is anadvantage
whenthere is a
relationship
between some of the ENV effects(months
or age atcalving)
andthe sire effects.
Therefore,
because of the shortcomputing
time(tabl. 4),
this method may be used toanalyse
datasimultaneously
for the 3types
of effects(ENV, herd-year
and sire
effects).
IV.
Proposal
tosimplify computing
on former dataAnother
problem
is related to the size of the data set studied.Although
ananalysis
is made every year, it is necessary toanalyse
data over many years so as to obtain :- an accurate evaluation of former bulls
through
a combination of former and recentinformation,
- an estimation of HY effects
independent
ofgenetic
differences betweenherds,
- an accurate evaluation also for bulls in progeny
testing,
- an estimation of
genetics
trends.With first lactations
only,
information from different years can be accumulatedby
the addition of different sets of
equation,
becauseherd-year equations absorption
may be done within a year. With severallactations, absorption
of cow andherd-year equations
cannot be done within a year.Analysis
of data from many years therefore involvesprocessing
of all data withoutusing previous computing.
Thisquickly
becomesimpossible
when many years of data are available.Another method based on an
approximation already
used in the Frenchdairy
sireevaluation
system (Pou T ous
etal., 1981), splits
the data into 3 groupsaccording
to2
criteria :
Active record : a record initiated no
longer
than p years in thepast.
Active cow : a cow which has at least one active record.
The 3 groups are defined as followed :
Group
1 : cow and record bothinactive,
Group
2 : cow active and recordinactive,
Group
3 : cow and record both active.Matrices
Y, U, R, T, S, Z,
E aresplit
up, each in the same way, into 3 sub- matricesaccording
to the 3 groups. Forexample :
equation III,
obtained from cowequations absorption,
can be written :The
off-diagonal
elements inB,
whichcorrespond
to one record from group1,
on the onehand,
and another from group 2 or3,
on the otherhand,
arealways
zerobecause
they
are not from the same cow. Matrix B can then we written as a directsum :
where
B,
andB2J
are cowequation absorption
matrices defined as before for B and relative to group 1 and groups 2 and3, respectively.
We have therefore :
Matrices
U’,B,U,
andU’,B,Y&dquo;
whichgive
all information from group 1 remain verylarge
asthey correspond
to ENV(m),
sire(p)
andherd-year (h)
effects. A reduction in size cannot be obtainedby absorption
ofherd-year equations,
as thisoperation
should beperformed
on the whole set ofequations.
To facilitate the
computing,
values ofherd-year
and ENV effects(h
andm)
relative to inactive records
(group
1 and2)
are assumed to be known fromprevious
sire evaluation runs
(estimates
h* andm * ).
Thisrequires
that ENV effects(m)
aredefined on a within year basis.
Therefore
analysis
can be made on thefollowing
variables :with the
following analysis
model :Information from the first group of data can then be summarized with the smaller matrix :
U’,B,U,
=S’,B,S,
which is adiagonal
matrix.U’,B,Y,
=S y B,Y,
Moreover,
data from the second group(Ý 2 ) (inactive performance
of activecows) only
contribute tocomputing
mk and the sum w, ofweights
wk, because the modelonly
includes the sire factor. This may also allow to reduce the size of the data set
actually
used for sire evaluation.
Our
approximation might
biasbreeding
values if differences betweenprevious
estimates
(h *
andm * )
and exhaustive estimates(from
theprocessing
of alldata)
arelarge
and not distributed at random. Biascorresponding
to cowculling might
not befully prevented
like after accurateapplication
of BLUPprocedures.
The main risk is abad estimation of
genetic
trends and therefore of differences between bulls used in different years. Extent of biasdepends
on the choice of the pvalue,
i.e. the number of yearsduring
which records are considered as active.Indeed,
the best is thehighest possible
p valueaccording
tocomputing
facilities. Thisrequires
further studies which could involvechecking
ofresults, mainly
estimatedgenetic trends, according
to thechoice of the p value or in
comparison
to the first lactation sire evaluation which caneasily
beapplied
to all data.V. Conclusion
Dairy
sire evaluationaccording
to Henderson’s BLUPmethodology
is difficult when several lactations of one and the same cow areanalysed by
means of a modelinvolving
notonly
the usual effects(sire,
year, herd andcow),
but also otherenvironmental effects such as month and age of
calving
orparity.
Arigorous applica-
tion on the
large dairy recording
files seems to beimpossible
with thepresent computation possibilities. Thus,
rather thansimplifying
the model ofanalysis
whichwould be the
only
way ofexactly applying
the BLUPprinciples,
this paper describes acomputing algorithm allowing
topartly
solve the difficultiesowing
to 2approximations.
However,
thevalidity
of the secondapproximation
wasonly partly
shown. The main risk is a bad estimation ofgenetic
trends and therefore of differences between bulls used in different years.Thus,
further research is needed to evaluate the extend of this bias and to allow thecomputing algorithm proposed
in this paper to be used.Received October
26,
1982.Accepted July 3,
1985.References H
ENDERSON C.R., 1973. Sire evaluation and
genetic
trends.Proceeding of the
AnimalBreeding
andGenetics
Symposium
in honorof Dr. Jay
L. LusH,Blacksburg, Virginia, July
29, 1972, 10-41, ASA-ADSA,Champaign,
Illinois.LusH J.L., 1931. The number of
daughters
necessary to prove a sire. J.Dairy
Sci., 14, 209.PouTous M., BRIEND Michèle, CALOMITI S., DOAN D., FELGINES C., STEIERG., 1981. M6thode de
calcul des index laitiers. Bases
generates,
Bul. Tech.Ing.,
361, 433-446.S
CHAEFFER L.R., 1975.
Dairy
sire evaluationfor
milk andfat production.
46 pp.,University
ofGuelph. Mimeograph.
S
EARLE S.R., 1965. Matrix
Algebra for
thebiological
sciences. 296 pp.,Wiley,
New York.U
FFORD G.R., HENDERSON C.R., VAN VLECK L.P., 1978. Derivation of
computing algorithms
forsire evaluation,
using
all lactation records and natural service sires. 46 pp., Animal ScienceMineograph
Series n° 39. CornellUniversity,
Ithaca, NY.Appendix
Data must be
analysed according
to the model :Vector f
represents
a group of several effects :with U =
(U,, U,,
...U,,, ...)
the relateddesign
matrices. The definition off_
is suchthat each row of the related
design
matrixU_
hasonly
one non zeroelement,
alsoequal
to 1.U
[(kl) ;] represents
the row of Ucorresponding
to the 1’&dquo; record of the k’&dquo; cow.Vectors E and c
represent
random effects with :Matrix V is a
diagonal
matrix withThe mixed model
equation
is :absorption
of cowequations gives :
p and s may be
computed using
thefollowing
results :Let nk the number of records of the k‘&dquo; cow and Wk =
l l w ki
J
nk
a matrix(n k xn k )
whose elements are allequal
to oneU
k
andV,
the submatrices of U and Vcorresponding
to the k’&dquo; cowWe have :
Each element of p can then be
computed
from theexpression :
In the same way for s
Notation : fl3 and