Analysis of residuals and adjustment in JRA
A. Oliveira¹‚ T. Azinheira Oliveira1,2‚ João T. Mexia3
¹UAb, Department of Exact and Technologic Sciences, R. Fernão Lopes, 9, 2º D, 1000-132 Lisbon, Portugal‚ aoliveira@univ-ab.pt, toliveir@univ-ab.pt
²CEAUL, Center of Statistics and Applications, University of Lisbon
3
Department of Mathematics, FCT- New University of Lisbon, Quinta da Torre, 2825 Monte da Caparica, Portugal‚ jtm@fct.unl.pt
SUMMARY
Joint Regression Analysis (JRA) is based in linear regression applied to yields, adjusting one linear regression per cultivar. The environmental indexes in JRA correspond to a non observable regressor which measures the productivity of the blocks in the field trials. Usually zig-zag algorithm is used in the adjustment. In this algorithm, minimizations for the regression coefficients alternate with those for the environmental indexes. The algorithm has performed very nicely but a general proof of convergence to the absolute minimum of the sum of squares of residues is still lucking. We now present a model for the residues that may be used to validate the adjustments carried out by the zig-zag algorithm.
Key words: Joint Regression Analysis, ziz-zag algorithm, residuals.
1. Introduction
JRA is a well known technique for the comparison of cultivars. This technique is based in the adjustment of linear regressions, one per cultivar; see Mexia et al. (2001). The regression has the yield as dependent variable. The zig-zag algorithm is always applicable, but does not ensure convergence. We presented a linear model for residuals to validate the adjustments. The usual zig-zag algorithm, usually performs well, see Mexia and Pinto (2004), Pereira (2003), Pereira and Mexia (2003), Pereira and Mexia (2003a) and Pinto (2005), but does not guarantee the convergence to the absolute minimum of the square
A. Oliveira‚ T.A. Oliveira‚ J.T. Mexia 46
of the residues. We now present a linear model for the residues to validate the adjustment obtained and discuss the significance of the results therein obtained.
2. Zig-zag Algorithm
Joint Regression Analysis rests on the adjustment of linear regression
J ,..., j , x j j+β =1
α , of the yields of the different cultivars on a synthetic
measure of productivity: the environmental index.
Let yi,j be the yield of cultivar j on block i if that cultivar is present in that
block, otherwise yi,j may take any value.
With xi the environmental index for block i, we are led to minimize
(
)
∑ ∑
= = − − = J j b i i j i j j j i x y q S 1 1 2 , , α β (2.1)where qi,j =1
[ ]
0 when cultivar j is present[not present] in block i.We point out that both the regression coefficients (αj,βj), j=1,...,J, and
the environmental indexes are unknown, so that we have to minimize both for the regression coefficients and the environmental indexes. We thus obtain the corresponding least square estimators.
One technique that has been extensively used to minimize S and thus adjust
both the regression coefficients and the environmental indexes is the zig-zag algorithm. To apply such algorithm we start by choosing a reasonable set of initial values for the environmental indexes. Usually the block average yields
may be used. Alternatively, if we are using an α-design, the super-block
average yields can be used as initial values for the environmental indexes of the
constituting blocks. Let x' and x'' be the smallest and largest initial values for
ZIG-ZAG ALGORITHM Choose x1n Minimize n J J x S α ,β 1 : ) ( ~( ) ~ 1 1 n n x x β α To minimize Sxnα~J(x1n),β~J(x1n)
Minimize the functions hi(x), i=1,...,n, to get x'1n
Standardize taking:
{
}
{
}
{
}
{
}
== == 1 11 1 1 11 1 1 11 1 1 11 1 n n n n ' x ,..., ' x min ' b ' x ,..., ' x min ' a x ,..., x max b x ,..., x min a(
1 1)
1 1 1 1 1 2 ' ' ' ' a x a b a b a xi i − − − + = n x2Repeat this procedure until successive sums of sums of squares of weighted residuals differ by less than a fixed amount
A. Oliveira‚ T.A. Oliveira‚ J.T. Mexia 48
Now the zig-zag algorithm is an iterative technique. The name of the technique is following from, in each iteration, minimization being carried first for the regression coefficients and then for the environmental indexes. At each iteration end, the values obtained for environmental indexes are rescaled to keep
the range
[
x' x; '']
unchanged. The algorithm is above presented.3. Model for residues
As stated above the zig-zag algorithm usually performs quite well but we still do not have a proof that it converge to the absolute minimum of the goal function.
One may think that if the adjustment was defective so that, with
(
αj,βj)
and xi
[
(
j j)
and~xi]
~ ,
~ β
α the exact [adjusted] regression coefficients and
environmental indexes, we would have
+ == + + = i i i j j j j j j u x x ~ ~ ~ η β β α γ α = J j 1,..., ; , i=1,...,b
with not small deviations
(
γ
j,η
j)
and ui. We are thus led to test thesignificance of these deviations.
To carry out these tests we consider a model for the regression residuals
(
j j i)
j i j i y x y, = , − α~ +β~ ~ o (3.2)Replacing the
(
α~j,β~j)
and x~i by there expressions we geti j i j j i j j i x u u y, =γ +~η +β~ +η o (3.3)
Assuming that the terms
η
jui, may be discarded we may write the model(3.3) in matrix form as
e X y= θ'+ o o o (3.4) where
(
)
= ⊗ ⊗ = = b j j b J J u , , u , , , , , ' I ~ X I X ' y ' y y K K K o o o o o o 1 1 1 1 η γ η γ θ β (3.5)and e represents de vector of error connected with yi,j , with the usual assumptions. We have = b x x X ~ 1 ~ 1 1 M M o (3.6)
The least square estimator for
θ
will beo o o o o o o y ' X X ' X ~ −1 =
θ
(3.7)and the sum of squares of the residues for the adjusted model will be
θ
~
X
'
y
y
'
y
SS
model o o o o o−
=
(3.8)To evaluate the zig-zag adjustment quality we can use
zag zig el mod SS SS R − − =1 2 (3.9)
A. Oliveira‚ T.A. Oliveira‚ J.T. Mexia 50
where SSzig−zag is the sum of squares of the original residues. Contrarily of
what is usual, we are interested in low values ofR2. These will indicate that no
significant amount of information was extracted by the model adjusted for the residues. This lack of information carried by the residues validates the initially adjusted model, since it, then, would carry the relevant information.
4. Application
This case consists of yield data of oats obtained in experiments carried out, from 1993 to 1997, by the Estação Nacional de Melhoramento de Plantas (the Portuguese plant breeding board), who kindly allowed us to use them. These experiments were designed as randomized blocks, with four blocks and eleven cultivars per experiment. The locations and cultivars were not the same for the different years, being presented in table 1.
Table 1. Locations and years where took place the trials
Location / Experimental station Year
Elvas (ENMP)
Elvas (Herdade da Comenda) Benavila Évora Beja Pegões 1993,1994,1995,1997 1994,1996,1997 1994 1993,1994,1995,1996,1997 1993,1994,1995,1996,1997 1993,1995,1996
Applying the zig-zag algorithm for each year and each block (row) we obtained, using (3.2), the adjusted regression coefficients as well as the adjusted environmental indexes. These results are shown in Tables 2 and 3.
Ta b le 2 . A d ju st ed r eg re ss io n c o ef fi ci en ts 1 9 9 3 1 9 9 4 1 9 9 5 1 9 9 6 1 9 9 7 C u lt iv ar α ~ β ~ α ~ β ~ α ~ β ~ α ~ β ~ α ~ β ~ A E 8 9 0 1 A E 9 0 0 4 A E 8 9 0 2 A E 9 0 0 5 S .V IC E N T E A E 9 0 0 2 S T A L E IX O A E 9 0 0 1 A E 9 0 0 3 A E 8 8 0 1 A V O N A E 9 3 0 3 A E 9 1 0 1 A E 9 3 0 2 A E 9 3 0 1 A E 9 4 0 1 A E 9 4 0 2 A E 9 4 0 3 -1 0 8 .4 6 -1 3 7 .0 7 -3 7 .3 5 -7 .7 6 -8 .2 6 1 2 .9 6 4 9 .6 0 5 9 .1 1 6 7 .5 6 9 1 .2 6 6 0 .5 8 - - - - 1 .2 7 0 2 1 .2 0 4 6 1 .0 2 5 7 1 .0 0 9 1 0 .9 8 7 7 0 .9 8 2 2 0 .9 4 9 9 0 .9 0 8 7 0 .8 9 6 7 0 .8 9 0 4 0 .8 1 5 0 - - - - - 2 0 8 .4 9 - -3 3 2 .0 7 - -2 2 8 .8 8 2 0 9 .4 9 -5 5 .1 5 2 2 .4 5 - 9 6 .3 6 -1 1 9 .6 1 -8 2 .6 6 1 7 5 .3 4 1 7 1 .8 5 - - - - 0 .9 2 9 3 - 1 .2 4 7 2 - 1 .1 1 1 3 0 .8 8 4 3 0 .9 0 2 4 1 .0 0 4 2 - 0 .9 2 8 2 1 .0 8 3 9 0 .9 7 1 0 0 .9 4 9 0 0 .9 3 7 3 - - - - - - - - 6 7 .5 1 3 2 .1 1 - -7 9 .2 0 - 7 3 .9 4 -0 .2 6 -2 4 9 .9 1 -2 4 .3 2 1 3 0 .7 9 -7 2 .5 9 6 8 .1 9 7 3 .4 6 - - - - - 0 .9 7 2 4 1 .0 0 3 7 - 1 .2 3 4 8 - 0 .9 0 2 7 0 .9 9 2 8 1 .2 1 0 2 1 .0 1 0 0 0 .9 1 0 4 1 .0 5 2 5 0 .8 7 1 6 0 .7 1 3 0 - - - - - -5 9 5 .2 6 6 7 9 .4 8 - 8 8 6 .5 1 - 1 2 8 .6 6 6 7 .4 9 6 3 .9 7 -1 9 7 .3 6 8 3 5 .8 4 -1 0 7 9 .9 -1 5 6 .3 9 -2 6 8 .8 3 - - - - - 1 .2 5 5 3 0 .9 0 6 0 - 0 .9 1 4 5 - 0 .9 3 2 7 0 .9 7 0 2 0 .9 4 7 0 1 .1 1 6 4 0 .7 2 7 6 1 .2 6 2 5 0 .9 9 1 3 0 .8 2 3 6 - - - - - 3 4 6 .4 9 9 6 .8 7 - -3 7 2 .2 3 - 2 5 2 .5 4 -9 1 .4 9 -8 4 .5 4 -1 0 .6 1 3 6 1 .3 6 -3 4 6 .7 2 1 5 .0 3 3 1 .5 0 - - - - - 0 .9 0 1 4 1 .1 2 7 6 - 1 .5 0 3 5 - 0 .8 0 0 8 1 .0 0 3 7 0 .9 7 0 4 1 .0 9 8 1 0 .6 3 3 3 1 .1 8 3 1 0 .8 9 2 3 0 .8 5 8 7
A. Oliveira‚ T.A. Oliveira‚ J.T. Mexia 52
Table 3. Adjusted environmental indexes
Year 1993 1994 1995 1996 1997 Number of blocks 16 20 16 16 16 818.59 681.89 692.94 684.85 512.38 583.62 564.36 606.74 1362.70 1298.35 1325.97 1539.50 134.34 147.60 161.45 169.13 3073.93 3715.50 3793.97 3891.24 3376.61 3290.09 3934.17 3838.16 4112.04 5032.46 5004.31 4888.40 642.85 949.77 847.68 913.36 3289.87 2976.09 2673.59 3492.01 1502.61 1357.96 1084.18 1347.66 1182.01 1452.43 1513.96 1465.87 1394.12 1848.48 1881.31 1680.99 614.40 774.51 389.83 380.41 4353.88 4787.51 4835.70 4897.19 1976.45 3693.10 3638.71 3573.67 2517.13 3087.93 3287.93 3516.68 1807.90 1988.30 1807.13 1634.20 1163.74 1512.78 1544.79 1306.06 1928.69 2061.29 2183.35 2384.54 813.82 995.58 1090.89 935.58 1450.49 2103.36 2535.30 2836.36
Of followed we used the environmental indexes to adjust models. In table 4
we present for each block (row) the adjusted corrections for regression
coefficients and environmental indexes.
Lastly in table 5 we present the sum of sums of squares for the zig-zag
T a b le 4 . A d ju st e d c o rr e ct io n s fo r re g re ss io n c o ef fi ci e n ts e n d e n v ir o n m e n ta l in d e x es 1 9 9 3 1 9 9 4 1 9 9 5 1 9 9 6 1 9 9 7 j γ j η i u j γ j η i u j γ j η i u j γ j η i u j γ j η i u 6 8 .6 3 -3 5 .1 4 -7 9 .0 0 4 2 .5 4 4 5 .6 5 3 4 .2 5 4 7 .9 7 3 1 .9 1 5 8 .9 1 -1 3 .3 2 3 7 .4 8 0 .0 1 5 0 .0 4 6 0 .0 6 8 0 .0 2 2 -0 .0 2 4 0 .0 0 7 -0 .0 1 7 -0 .0 1 3 0 .0 1 9 -0 .0 4 6 -0 .0 8 3 1 7 .2 4 -3 6 .5 8 2 1 .8 2 2 6 .3 4 -3 7 .5 3 -6 2 .9 7 1 7 .8 3 1 1 .0 7 2 2 .8 3 5 4 .9 4 -2 9 .0 7 3 1 .6 4 -6 1 .7 1 8 5 .2 9 2 8 .3 6 -5 5 .2 5 -1 0 .8 1 -4 8 .1 1 4 .3 5 -4 0 .8 3 1 8 .9 8 -3 .6 9 -9 .5 3 -1 2 .3 6 1 4 .2 7 -3 .4 9 -2 7 .8 5 0 .0 1 8 0 .0 1 0 -0 .0 0 2 0 .0 0 4 -0 .0 1 8 0 .0 0 8 -0 .0 0 8 -0 .0 1 0 0 .0 3 8 0 .0 3 2 -0 .0 0 8 -4 7 .9 9 -8 4 .8 0 -1 7 8 .8 -1 2 3 .4 -1 3 8 .4 -2 0 .4 9 -4 1 .4 0 1 4 2 .4 8 4 9 .0 6 1 8 5 .4 4 1 0 6 .6 7 -1 7 5 .1 3 1 .1 5 6 0 .3 8 3 0 .6 8 1 7 .2 4 -0 .6 1 1 8 .8 0 7 3 .6 5 -8 8 .5 2 -2 9 .6 6 -1 3 .4 1 -0 .5 5 -7 6 .4 2 4 8 .0 3 -1 7 .4 1 8 7 .0 5 2 8 .5 6 3 2 .9 7 -5 6 .2 8 2 6 .8 1 0 .0 1 7 0 .0 0 1 -0 .0 0 2 0 .0 0 4 -0 .0 1 8 0 .0 0 8 -0 .0 0 8 -0 .0 1 0 0 .0 3 8 0 .0 3 2 -0 .0 0 8 -3 9 .7 4 -9 0 .3 2 6 9 .5 9 1 4 .6 5 -1 0 .1 3 -4 4 .1 9 -1 2 .9 2 2 0 .0 1 -8 9 .1 7 -1 3 8 .2 -5 4 .7 7 -2 6 .8 1 4 8 .5 3 1 4 .1 7 -1 8 .6 4 -3 7 .7 0 -4 5 .6 4 3 7 .5 4 3 5 .8 9 5 4 .0 0 6 6 .6 9 1 0 .3 3 -1 4 .1 2 1 3 .9 0 2 6 .3 6 6 9 .2 4 3 5 .9 1 -0 .0 0 3 -0 .0 0 3 0 .0 0 2 -0 .0 0 2 0 .0 0 1 0 .6 8 8 0 .0 0 3 0 .0 1 1 0 .0 0 2 0 .0 0 5 -0 .0 0 1 1 7 .5 0 -2 7 .5 3 -1 2 .2 5 -1 3 .7 6 1 7 .7 3 -8 6 .8 6 -6 0 .8 5 4 0 .3 8 1 4 .0 0 -2 2 .5 4 -3 0 .1 8 -1 7 .0 4 1 9 .1 3 -1 3 .2 6 6 1 .6 0 -0 .0 7 -1 7 .5 9 4 9 .0 5 -0 .3 4 -5 6 .1 3 -7 4 .7 4 -2 5 .3 0 -1 2 .0 6 7 8 .9 5 4 7 .0 2 -5 1 .7 6 -0 .6 1 -0 .0 1 0 -0 .0 0 2 0 .0 0 6 0 .0 0 9 -0 .0 1 2 0 .0 1 1 -0 .0 2 5 0 .0 0 2 -0 .0 2 8 0 .0 0 6 0 .0 0 1 -7 0 .9 2 1 7 .5 7 7 .8 9 9 1 .5 4 2 4 .2 4 -3 6 .7 1 -1 7 .8 2 -1 0 .9 7 -0 .0 2 9 3 0 .4 2 5 7 .0 0 -0 .9 1 -5 9 .8 0 -2 9 .7 0 3 0 .1 2 -2 5 .9 9
A. Oliveira‚ T.A. Oliveira‚ J.T. Mexia 54
Table 5. Comparison of the sum of squares of residues
1993 1994 1995 1996 1997 Sum of squares (zig-zag) 1.9283⋅106 5.4067⋅107 1.0998⋅107 3.1285⋅107 2.0115⋅107 Sum of squares (model) 1.9283⋅106 5.4067⋅107 1.0998⋅107 3.1285⋅107 2.0104⋅107 R² 1.4⋅10-5 1.8⋅10-5 4.9⋅10-7 2.6⋅10-5 5.5⋅10-4 5. Conclusion
The low values of R2 validate the use of the zig-zag algorithm in this case.
REFERENCES
Mexia J.T., Pereira, D. G. and Baeta, J. (2001): Weighted linear joint regression analysis. Biometrical Letters 38: 33−40.
Mexia J.T., Pinto I. (2004): Comparison of wheat cultivars in Portugal (1986-2000). Colloquium Biometryczne 34: 137−145.
Morais, V., Vieira C. (2006): MATLAB 7&6 Curso Completo. FCA.
Pereira D.G.(2003): Análise Conjunta Pesada de Regressões em Redes de Ensaios. Tese de Doutoramento. Universidade de Évora.
Pereira D.G., Mexia J.T. (2003): Reproducibility of Joint Regression Analysis. Colloquium Biometryczne 33: 279−291.
Pereira D.G., Mexia J.T. (2003a): The use of Joint Regression Analysis in selecting recommended cultivars. Biuletyn Oceny Odmian (Cultivar Testing Bulletin) 31: 19−25.
Pinto I. (2005): Análise Conjunta de Regressões e Planos de Melhoramento. Tese de Doutoramento. FCT- Universidade Nova de Lisboa.