0% found this document useful (0 votes)
18 views9 pages

2005 - Best Linear Unbiased Prediction of Cultivar Effects For Subdivided Target Regions

The paper discusses the importance of breeding for local adaptation in plant cultivars by utilizing best linear unbiased prediction (BLUP) methods to combine data from subdivided target regions. It highlights the trade-off between broad and local adaptation, emphasizing that a weighted approach can enhance prediction accuracy for both objectives. The authors propose a mixed model framework that allows for optimal use of data from neighboring subregions to improve yield estimates in the targeted subregion.

Uploaded by

Júnior Herênio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views9 pages

2005 - Best Linear Unbiased Prediction of Cultivar Effects For Subdivided Target Regions

The paper discusses the importance of breeding for local adaptation in plant cultivars by utilizing best linear unbiased prediction (BLUP) methods to combine data from subdivided target regions. It highlights the trade-off between broad and local adaptation, emphasizing that a weighted approach can enhance prediction accuracy for both objectives. The authors propose a mixed model framework that allows for optimal use of data from neighboring subregions to improve yield estimates in the targeted subregion.

Uploaded by

Júnior Herênio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Published online May 6, 2005

Best Linear Unbiased Prediction of Cultivar Effects for Subdivided Target Regions
H. P. Piepho* and J. Möhring

ABSTRACT hand, subdividing the target tends to increase heritabili-


Breeding for local adaptation may be economically viable providing ties (on a mean basis) within subregions, essentially
there is sufficient genotype ⫻ subregion interaction. If the targeted because the genotype ⫻ subregion interaction variance
subregion is part of a larger region covered by a testing network, in- becomes a genetic variance component. On the other
formation from neighboring subregions can be exploited to gain more hand, subdivision of resources will leave only a limited
Reproduced from Crop Science. Published by Crop Science Society of America. All copyrights reserved.

precise estimates for the targeted subregion. For balanced data, the number of trials per subregion, thus decreasing herita-
simplest approach is to use genotypic mean estimates for the whole bilities within subregions (Atlin et al., 2000). This is
target region, and this has often been shown to yield better predictions
why selection based on subregion means alone is not
than simple means per subregion. A disadvantage of this approach
is that it gives equal weight to all neighboring subregions and the tar-
necessarily a good strategy.
geted subregion, thus ignoring potential heterogeneity in information Regional trial networks designed to provide cultivar
content. The objective of the present paper is to propose a method recommendations present breeders with a similar di-
that allows a weighted combination of data from several subregions lemma. If recommendations are based on overall perfor-
and to compare that method to other estimators. The proposed method mance in the target region, cultivars with good local
is based on best linear unbiased prediction, which employs a weighted adaptation in one or more subregions may go unnoticed,
mean of subregion means. It follows from the theory of mixed models resulting in suboptimal cultivar recommendations. Con-
that the resulting estimator is optimal under the assumed model, mini- versely, an attempt to detect local adaptation by sub-
mizing prediction errors and maximizing the expected gain from selec- division of the target may be compromised by a limited
tion. Using published variance component estimates, we found the number of trials remaining per subregion.
resulting predictions to be superior to other approaches. We also show
that the estimator is beneficial when selecting for global adaptation.
The common view seems to be that there are basically
two alternative approaches to estimation, depending on
whether one strives for broad or local adaptation: (i) if
the objective is broad adaptation, ignore subregions and
P lant breeders usually seek to develop broadly
adapted varieties for a wider target region. If the
target region is agroecologically diverse, it may be worth-
use all data to make selections or recommendations based
on overall performance in the target region; (ii) if the
objective is local adaptation, use only data from the tar-
while to stratify the target into more homogeneous sub- geted subregion for selection or recommendation. An
regions. Stratification will allow more accurate overall associated notion is that selecting or recommending for
performance estimates for candidate varieties in the tar- local adaptation requires almost the same amount of
get region, thus increasing gain from selection. Alterna- resources within each subregion as would be needed for
tively, plant breeders may opt to develop locally adapted assessing broad adaptation with the same accuracy. This
varieties for specific subregions. Breeding for local ad- notion assumes that subregions are not substantially
aptation will be worthwhile only when there is sub- differentiated and that local adaptation can be detected
stantial genotype ⫻ subregion interaction. Moreover, only on the basis of data from the targeted subregion
division of a target region will be accompanied by a (Comstock and Moll, 1963; Talbot, 1997; Atlin et al.,
division of testing resources. Thus, despite presence of 2000). More often than not, the result of this common
substantial genotype ⫻ subregion interaction, it may view has been that global adaptation is favored over
turn out to be more efficient to breed for broad adapta- local adaptation. This is usually a reasonable choice if
tion, if resources are not sufficient for accurately de- one considers only the two alternatives described above.
tecting locally adapted genotypes. This has been lucidly The objective of this paper is to show that accuracy
demonstrated by Curnow (1988) and Atlin et al. (2000), of yield estimates can be increased both for global and
who studied the response to selection in subdivided for local adaptation, if a slightly modified route of analy-
target regions. The authors considered genotypic means sis is followed. The suggestion is to (i) always contem-
in a large target and constituent subregions as correlated plate a subdivision of the target and (ii) always use all
traits. They showed that the correlated response to se- the data, employing a suitable weighting scheme, based
lection for overall performance may outperform the on genetic variances and covariances among subregions,
direct response to selection within subregions. no matter whether broad or local adaptation is the ob-
There are two opposing factors that will determine if jective. We show how standard mixed model procedures
breeding for local adaptation is worthwhile. On the one (best linear unbiased prediction) can be used for this
task. The method is developed by initially considering
Bioinformatics Unit, Univ. of Hohenheim, Fruwirthstrasse 23, 70599 balanced data and a two-step approach. This simplifies
Stuttgart, Germany. Received 2 July 2004. Crop Breeding, Genetics & the exposition and makes key features easier to appreci-
Cytology. *Corresponding author ([email protected]).
ate. Subsequently, we will stress that a restricted maxi-
Published in Crop Sci. 45:1151–1159 (2005).
doi:10.2135/cropsci2004.0398 Abbreviations: ANOVA, analysis of variance; BLUP, best linear unbi-
© Crop Science Society of America ased prediction; MSEP, mean squared error of prediction; REML,
677 S. Segoe Rd., Madison, WI 53711 USA restricted maximum likelihood.

1151
1152 CROP SCIENCE, VOL. 45, MAY–JUNE 2005

mum likelihood (REML)-based mixed model analysis The environmental deviation will be modeled by a standard
demonstrates its full power with unbalanced data and we partition of the form
will explain how replicate data, balanced or unbalanced, eirjk ⫽ Yk ⫹ L(S)j(r ) ⫹ (SY)rk ⫹ LY(S)jk(r) ⫹ (GY)ik ⫹
can be dealt with in a single-step analysis. Some easy-
to-use SAS code, which also works for unbalanced data, GL(S)ij(r) ⫹ (GSY)irk ⫹ GLY(S)ijk(r), [2]
is presented in an appendix. where Yk ⫽ main effect of kth year, L(S )j(r) ⫽ effect of jth
location nested within rth subregion, (SY)rk ⫽ rkth sub-
region ⫻ year interaction, LY(S )jk(r) ⫽ jkth location ⫻ year
THEORY
interaction nested within rth subregion, (GY)ik ⫽ ikth geno-
Reproduced from Crop Science. Published by Crop Science Society of America. All copyrights reserved.

Subdivision for Global Adaptation type ⫻ year interaction, GL(S )ij(r) ⫽ ijth genotype ⫻ location
interaction nested within rth subregion, (GSY)irk ⫽ irkth
If a random sample of locations is used for yield testing in genotype ⫻ subregion ⫻ year interaction, GLY(S )ijk(r) ⫽ ijkth
each subregion, the resulting data may be regarded as a ran- genotype ⫻ location ⫻ year interaction nested within rth sub-
dom stratified sample, with strata corresponding to subregions. region (includes error of a treatment mean). All effects ap-
As is well known from the theory of survey sampling (Kish, pearing in eirjk are assumed to be independent homoscedastic
1965), stratification of a target region is beneficial providing normal deviates with zero mean.
there is heterogeneity between subregions, governed by envi- The genetic effect may be further partitioned as
ronmental factors such as climate, soil type, or topography.
Gain in accuracy is largest in the one extreme (but hypotheti- gir ⫽ Gi ⫹ (GS)ir, [3]
cal) case that all heterogeneity occurs between subregions,
while there is complete homogeneity within subregions. In where Gi is a main effect for the ith genotype and (GS)ir is
this extreme case, a single location per subregion would suffice the irth genotype ⫻ subregion interaction, assuming that Gi
to assess the expected cultivar yield per subregion. Conversely, and (GS)ir are independent homoscedastic normal deviates.
in the other extreme case, where all heterogeneity occurs within This model implies that the variance-covariance model for
subregions and none among subregions, stratification is not gi ⫽ (gi1, gi2, …, gim)⬘, where m is the number of subregions,
beneficial. has the compound symmetry structure, i.e.,
In stratified samples, the overall mean is estimated by a var(gi) ⫽ ⌺g ⫽ Jm␴ 2G ⫹ Im␴ 2GS, [4]
weighted mean of the subregion means, with the growing area
per subregion used as a weight. To see that stratification is bene- where Jm is an m ⫻ m matrix of ones everywhere, Im is an
ficial, again consider the extreme case where there is no heter- m-dimensional identity matrix, and ␴ 2G and ␴ 2GS are the vari-
ogeneity within subregions, but considerable heterogeneity ances of Gi and (GS)ir, respectively. Under the compound
between subregions. The weighted mean based on a stratified symmetry model, genetic variances are the same in each sub-
sample, with growing areas used as weights, will have a vari- region, and genetic covariances (and correlations) are the
ance of zero, while the simple mean based on an unstratified same for each pair of subregions. While this assumption may
sample will have a variance depending on the heterogeneity be useful in simple settings, more diverse settings call for more
between subregions. The only additional prerequisite for a refined modeling. Specifically, some pairs of subregions may
stratified estimate of overall performance is that the growing be more alike than others, requiring heterogeneity of covari-
areas per subregion must be available. ances to be allowed for. Also, there may be heterogeneity of
genetic variance between subregions. Many extensions of the
Estimation of Local Adaptation ANOVA-type model, Eq. [3], have been proposed, which can
be used for modeling ⌺g (Piepho, 1998, 1999; Smith et al.,
Local adaptation in a subregion is usually assessed by ana- 2001). Here, we will confine attention to the compound sym-
lyzing data only from the targeted subregion (Talbot, 1997). metry model in order to facilitate comparison to other meth-
It is often true, however, that some of the neighboring sub- ods. It is stressed, however, that quite frequently more com-
regions are agroecologically similar to the subregion of inter- plex variance–covariance structures are needed.
est. Thus, yield data from neighboring subregions may be Analysis of regional yield trials should be based on replicate
exploited to improve yield estimates for the subregion of inter- data, using the model described above. This allows exploiting
est. A natural approach is to compute a weighted mean of regional subdivision for estimation of both local and global
mean yields in the targeted subregion and the neighboring adaptation in an optimal way. Following this approach, esti-
subregions, with weights depending on the similarity between mates of gir may be obtained by BLUP using standard proce-
subregions and the number of trials per subregion. In fact, dures (Searle et al., 1992). Some sample code for SAS is given
with a weighting approach, one could use data from all sub- in Appendix A. This single-step analysis may be contrasted to
regions for estimation in the targeted subregion, providing a two-step analysis, in which genotypic means per subregion
the availability of weights that are optimal or near-optimal in are estimated in the first step. These means are then subjected
terms of the error of prediction for the targeted subregion. We to a mixed model analysis to obtain BLUPs of gir in the second
will show, subsequently, that best linear unbiased prediction step. In balanced settings, both procedures yield identical re-
(BLUP; Searle et al., 1992) is the method of choice for this task. sults, while with unbalanced data, results differ and the REML-
based single-step analysis is to be preferred. To study the
Mixed Model for Subdivided Target Regions properties of the BLUP procedure, it is more convenient to use
the two-step approach and restrict attention to the balanced
Our basic mixed model is set-up, and this will be done subsequently.
yirjk ⫽ ␮r ⫹ gir ⫹ eirjk, [1]
where ␮r ⫽ expected value in the rth region, gir ⫽ random
Implied Model for Subregion Means
genotypic value of ith genotype in rth subregion and eirjk ⫽ For demonstration purposes, we will assume here that the
random environmental deviation of the ith genotype in the series of trials in each subregion is balanced in the following
kth year and in the jth location within the rth subregion. sense: On the basis of the mixed model described in the preced-
PIEPHO & MÖHRING: BEST LINEAR UNBIASED PREDICTION FOR SUBDIVIDED TARGET REGIONS 1153

ing section, taking genotypic effects gir fixed and environ- because it involves an optimal weighting scheme, as detailed
mental effects random, the means of all cultivars in a subregion below. Estimator [10] is a special case of [9] with weights wⴕ ⫽
are homoscedastic, i.e., they all have same variance. In addi- vⴕW. These weights are optimal and follow from BLUP theory
tion, the means for all pairs of cultivars have the same covari- (Searle et al., 1992). The estimator [11] is essentially the BLUP
ance. This assumption is made here mainly to simplify the of genetic effects gi (see Appendix B). When v ⫽ ur , we will
comparison of different estimators and to gain some insight refer to Eq. [10] as local BLUP, while when v ⫽ a, we refer
into their statistical properties. to Eq. [10] as global BLUP.
Let yir denote the mean for the ith genotype in the rth In practice, variance components in W are unknown and
subregion. We can assume the model need to be estimated. Plugging in estimates for parameters
yields empirical BLUP with added uncertainty because of
yir ⫽ ␮r ⫹ gir ⫹ eir,
Reproduced from Crop Science. Published by Crop Science Society of America. All copyrights reserved.

[5] estimated weights. Providing parameters are estimated by


where eir is the error associated with yir. We find, conditioning REML using, e.g., a Newton-Raphson or Fisher scoring algo-
on the genetic effects, that rithm, uncertainty can be accounted for when computing ap-
proximate standard errors of BLUPs using the asymptotic dis-
E(yi) ⫽ ␮ ⫹ gi , persion matrix (Kackar and Harville, 1984). Generally, when
E(ei) ⫽ 0, [6] a REML-based mixed model package such as MIXED is em-
ployed, the user need not worry about computation of weights
var(ei) ⫽ ⌺e1 ⫹ ⌺e2 , and W: these will be computed automatically for the BLUP of gi
cov(ei , eiⴕ) ⫽ ⌺e2 (i ⬆ i⬘), on the basis of the fitted variance–covariance model. In the
case of global adaptation, Li ⫽ vⴕgi can be estimated from the
where yi ⫽ (yi1 , yi2 , ..., yim)⬘, ␮ ⫽ (␮1, ␮2 , ..., ␮m)⬘, gi ⫽ (gi1 , BLUP for gi , which may require additional computation fol-
gi2 , ..., gim)⬘ and ei ⫽ (ei1 , ei2 , ..., eim)⬘. Now taking genetic effects lowing the run of the mixed model routine. Note that the
random, we have E(gi) ⫽ 0, var(gi) ⫽ ⌺g , and cov(gi , ei) ⫽ weighting matrix W is analogous to the broad-sense heritabil-
cov(gi , eiⴕ) ⫽ 0. For ⌺g , one may assume the compound symme- ity in the case of an undivided target region. Generally, to
try structure, Eq. [4], or some more general model. estimate gir for a particular subregion, information on the ith
For illustration, consider the special case that the design is genotype is used from all subregions. The extent to which
completely balanced, i.e., in each of the m subregions the n the information from neighboring subregions is exploited is
genotypes are tested in l locations and y years. In this case, ⌺e1 determined by W and depends on the heritabilities in the
has diagonal elements ␴ 2GY /y ⫹ ␴ 2GSY /y ⫹ ␴ 2GSL /l ⫹ ␴ 2GSLY /(ly) neighboring subregions and on the genetic and environmental
and off-diagonal elements ␴ 2GY /y, while ⌺e2 has diagonal ele- correlations with the subregion of interest, as determined by
ments ␴ 2Y /y ⫹ ␴ 2SY /y ⫹ ␴ 2SL /l ⫹ ␴ 2SLY /(ly) and off-diagonal ele- the structures of ⌺g and ⌺e1, respectively.
ments ␴ 2Y /y. It should be stressed, however, that Eq. [6] is
more broadly applicable, e.g., when the number of locations
differs among years or among subregions or both, when only Variance-Bias Trade-Off
some locations are used for several years, while others are The BLUP of gi is unbiased in the sense that, across all
exchanged every year, or when the number of years is not the genotypes, BLUP has the same expected value as gi itself,
same for each subregion. Under the assumed model in Eq. i.e., E(gi) ⫽ E[BLUP(gi)] ⫽ 0. Clearly, this expectation is
[1] and [2], the only prerequisite for the validity of Eq. [6] is unconditional (Searle et al., 1992, p. 269). By contrast, there
that the same set of n genotypes is tested in each trial. is a bias conditional on the genotype, i.e., E(BLUP(gi)|gi) ⬆
gi . The bias may increase when incorporating data from neigh-
Estimation—Local and Global BLUP boring subregions. This may be counterbalanced, however, by
On the basis of regional subdivision, the genotypic value the reduced estimation variance because of the use of more
in the target region can be expressed as data. Thus, it is usually beneficial to exploit data from neigh-
boring subregions.
gi ⫽ a1gi1 ⫹ a2gi2 ⫹ a3gi3 ⫹ ... ⫹ amgim ⫽ aⴕgi , [7] The purpose of this section is to shed more light on our
where ar (r ⫽ 1, ..., m ) are the relative growing areas in the proposed procedure, explaining how the gain of efficiency
m subregions (expressed as proportions) and a ⫽ (a1, a2 , ..., comes about. The variance-bias trade-off can be most conve-
am)⬘. We wish to either estimate gi (for assessing global adapta- niently studied by considering pairwise differences among ge-
tion) or gir (for assessing local adaptation in the r th subregion). notypic effect estimates. Note that the ranking of genotypes
The estimable function of interest is of the general form is fully determined by the set of all pairwise differences. The
difference of two genotypes i and i⬘ is given by
Li ⫽ vⴕgi [8]
␦ii⬘ ⫽ vⴕ(gi ⫺ giⴕ). [13]
with v ⫽ a for gi (global adaptation) and v ⫽ ur for gir (local
adaptation), where ur is a unit vector with r th element equal For simplicity, we will drop the indices on ␦, i.e., we set
to unity and zeros elsewhere. For example, when there are ␦ ⬅ ␦ii⬘. The difference ␦ is estimated by BLUP according to
five subregions, and an estimator is needed for the second
region, we set v ⫽ ur ⫽ (0, 1, 0, 0, 0)⬘. We consider estimators ␦˜ ⫽ vⴕ(g̃i ⫺ g̃iⴕ). [14]
of Li, which are of the form This estimator is biased, since for given genotypes i and i⬘
L̃i ⫽ wⴕ(yi ⫺ ␮), [9] E(␦ ⫺ ␦˜ |gi , giⴕ) ⫽ Bias ⫽ (v ⫺ w)⬘(gi ⫺ giⴕ), [15]
where w are suitably chosen weights and the tilda indicates
an estimator. Our preferred estimator is where wⴕ ⫽ vⴕW. Thus, exploiting information from neigh-
boring subregions introduces a bias. This may be more than
L̃ opt
i ⫽ vⴕg̃i with [10] offset, however, by a reduction in the estimation variance.
The common criterion for combining bias and variance is
g̃i ⫽ W(yi ⫺ ␮), where [11]
the mean squared error of prediction (MSEP), which in the
W ⫽ 兺g(兺g ⫹ 兺e1)⫺1, [12] case at hand is
1154 CROP SCIENCE, VOL. 45, MAY–JUNE 2005

Table 1. Variance component estimates for three multienvironment trials (reproduced from Atlin et al., 2000), expressed as proportion
of the phenotypic variance ␴˜ 2P ⫽ ␴˜ 2G ⫹ ␴˜ 2GL ⫹ ␴˜ 2GY ⫹ ␴˜ 2GYL ⫹ ␴˜ 2E.
Variance component estimates
Crop Region ␴˜ 2
G ␴˜ 2
GL ␴˜ 2GY ␴˜ 2GYL ␴˜ 2E
Genotype ⫻ Genotype ⫻ Genotype ⫻
Genotype location year year ⫻ location Error
Winter wheat Eastern Canada 0.36 0.03 0.02 0.29 0.30
Spring wheat Western Canada 0.29 0.11 0.02 0.27 0.31
Spring wheat Australia 0.05 0.13 0.11 0.12 0.58
Reproduced from Crop Science. Published by Crop Science Society of America. All copyrights reserved.

MSEP ⫽ Eg [(␦ ⫺ ␦˜ )2|gi , giⴕ] ⫽ Eg[Bias2|gi , giⴕ] ⫹ Var, estimates were also used by Atlin et al. (2000) to demonstrate
[16] their approach for estimation in subdivided target regions. We
used the variance components in Table 1 to assign values to the
where variance components in our mixed model (Eq. [1], [2], and
[3]) (Table 2). Following Atlin et al. (2000), the number of
Eg[Bias2|gi , giⴕ] ⫽ 2(v ⫺ w)⬘ ⌺g(v ⫺ w) [17]
replications per trial was three, the total number of locations
and in the trial network was set equal to 12, and locations were
evenly split among subregions. For simplicity, the variance of
Var ⫽ var(␦|gi , giⴕ) ⫽ 2wⴕ⌺e1w. [18]
(GSY)irk was set to zero. Since our model is based on trial means,
The subscripted g indicates that expectations are with re- we set the variance of the residual effect GLY(S )ijk(r) equal
spect to genotype pairs. It is seen from Eq. [16] that the MSEP to ␴˜ 2GLY ⫹ ␴˜ 2E /3, with variance components ␴˜ 2GYL and ␴˜ 2E taken
depends on both bias and variance. The weights W in wⴕ ⫽ from Table 1.
vⴕW are chosen so as to minimize the MSEP. Thus, the optimal The compound symmetry structure was used for ⌺g (Eq. [4]).
weights strike the best balance between bias and variance. Assuming an orthogonal design per subregion, the diagonal
Specifically, absence of bias is not a requirement: some bias elements of ⌺e1 were equated to ␴ 2GY /y ⫹ ␴ 2GSY /y ⫹ ␴ 2GSL /l ⫹
can be tolerated, providing the MSEP is smaller than for an ␴ 2GSLY /(ly), where l is the number of locations per subregion
unbiased estimator. Not only does BLUP minimize the MSEP, and y is the number of years. The off-diagonal elements were
but it also maximizes the response to selection in variance set to ␴ 2GY /y .
component models (Searle et al., 1992; but see Portnoy, 1982).

Gain from Selection RESULTS


We consider response to selection based on the estimator Estimation for Global Adaptation
L̃i. In the special case that wⴕ ⫽ vⴕW, where Li ⫽ vgi , this is
our proposed optimal estimator L̃iopt. For generality, we regard
To study the gain from stratification, we assumed that
L̃i as an indirect trait and formulate the selection response as the target can be subdivided into subregions of equal
a correlated response to selection (Falconer and Mackay, 2001; size, i.e., the relative areas are ar ⫽ 1/m. Estimation of
Atlin et al., 2000). Selection for a direct trait is included as a the overall mean (gi) in the target region by global
special case with genetic correlation equal to unity. We have BLUP was performed by setting v ⫽ a ⫽ (a1 , a2)⬘ and
var(Li) ⫽ vⴕ⌺gv, [19] w ⫽ vⴕW. The response to selection based on our
method is denoted as R1 . For comparison, we computed
var(L̃i ) ⫽ wⴕ(⌺g ⫹ ⌺e1)w, [20] the response to selection assuming that stratification is
cov(L̃i , Li) ⫽ vⴕ⌺gw. [21] ignored and locations are a fully random sample from
The genetic correlation between Li and L̃i is
the target region (R0). Ratios R1 /R0 are reported in
Table 3. The results show that one can only win by
vⴕ⌺gw stratification and that the gain is largest when the
␳g ⫽ . [22]
√(vⴕ⌺gv)(wⴕ⌺gw) genotype ⫻ subregion interaction variance is large rela-
tive to the variance of the genetic main effect. In the
The heritability of L̃i equals
most favorable case reported in Table 3, stratification
wⴕ⌺gw results in a 20% improvement in the response to se-
h2 ⫽ . [23]
wⴕ(⌺g ⫹ ⌺e1)w lection.
We studied the effect of unequal subregion areas ar
The correlated response to selection is
assuming that the target region is subdivided into two
R ⫽ i␳gh√var(Li), [24] subregions with a1 ⫽ q and a2 ⫽ 1 ⫺ q, where q takes on
where i is the selection intensity (Falconer and Mackay, 2001).
For evaluation of different methods it is convenient to consider Table 2. Variance components for mixed model given by Eq. [1],
the ratio of R-values, since the selection intensity as well as [2], and [3] as derived from estimates in Table 1. p is the
proportion of ␴˜ 2GL assigned to ␴ 2GS.
公[var(Li)] cancel out. Note that all results in this section are
rather general in that they do not require a special structure Value assigned from
for ⌺g or ⌺e1 , such as compound symmetric. Effect in model (2) Variance estimates in Table 1
Gi ␴ 2G ␴˜ 2G
Variance Component Estimates (GS)ir ␴ 2GS p␴˜ 2GL
(GY)ik ␴ 2GY ␴˜ 2GY
To illustrate our procedure, we use published variance com- GL(S)ij(r) ␴ 2GSL (1 ⫺ p)␴˜ 2GL
ponent estimates for three multienvironment trials with wheat (GSY)irk ␴ 2GSY 0
GLY(S)ijk(r) ␴ 2GSLY ␴˜ 2GYL ⫹ ␴˜ 2E /3
(Triticum aestivum L.), which are reproduced in Table 1. These
PIEPHO & MÖHRING: BEST LINEAR UNBIASED PREDICTION FOR SUBDIVIDED TARGET REGIONS 1155

Table 3. Selection for global adaptation. Ratio R1/R0, where R0 ⫽ Estimation for Local Adaptation
response to selection for gi ignoring stratification and R1 ⫽
response to selection for gi using global BLUP. Analysis based Local BLUPs of the subregion mean were obtained
on three replications per trial. p is the proportion of ␴˜ 2GL by setting v ⫽ ur and w ⫽ vⴕW. The response to selection
assigned to ␴ 2GS (see Table 2). based on local BLUP is denoted as R3 . For comparison,
Value of ratio R1 /R0 selection using the subregion mean was implemented
1 yr 2 yr by setting w ⫽ v ⫽ ur . The response to selection by
this method is denoted as R4. The response to selection
p l⫽2 l⫽4 l⫽6 l⫽2 l⫽4 l⫽6
based on the unweighted global mean using the mixed
Winter wheat in eastern Canada (␴˜ /␴˜ ⫽ 0.08)
2 2
model in Eq. [1], [2], and [3] was computed as proposed
Reproduced from Crop Science. Published by Crop Science Society of America. All copyrights reserved.

GL G
0.1 1.000 1.000 1.001 1.000 1.000 1.000
0.3 1.001 1.001 1.002 1.001 1.001 1.001
by Atlin et al. (2000). This is denoted as R2 . The ratios
0.5 1.002 1.002 1.003 1.002 1.002 1.002 R2 /R4 and R3 /R4 are reported in Table 5.
Spring wheat in western Canada (␴˜ 2GL /␴˜ 2G ⫽ 0.37) The most important result is that local BLUP always
0.1 1.002 1.002 1.003 1.002 1.002 1.002 does better than selection based on a subregion mean
0.3 1.005 1.007 1.008 1.005 1.006 1.007 or than selection based on the global mean as proposed
0.5 1.009 1.011 1.014 1.009 1.010 1.011
by Atlin et al. (2000). In some cases, the differences are
Spring wheat in Australia (␴˜ 2GL /␴˜ 2G ⫽ 2.67)
0.1 1.019 1.034 1.049 1.017 1.030 1.041
quite marked; in others, differences are minor. When
0.3 1.054 1.096 1.133 1.050 1.082 1.111 the genotype ⫻ subregion interaction is substantial
0.5 1.088 1.151 1.204 1.081 1.128 1.168 (spring wheat in Australia), the global mean performs
poorly, while local BLUP is slightly better than the
the values 0.3, 0.1, and 0.01. Estimation of the overall subregion mean. When the genotype ⫻ subregion inter-
mean (gi) in the target region by global BLUP was action is small (winter wheat in eastern Canada), the
implemented by setting v ⫽ a ⫽ (a1 , a2)⬘ and w ⫽ vⴕW. global mean almost always outperforms the subregion
The associated response to selection is denoted as R1 . mean, but is itself outperformed by local BLUP.
For comparison, the response to selection based on the Table 6 shows relative weights for the targeted sub-
genotypic main effect Gi using the mixed model in Eq. region and the other subregions. The latter are equal
[1], [2], and [3] was computed as proposed by Atlin et al. for all subregions because of balancedness of the design
(2000). This is denoted as R2 . The method corresponds and the compound symmetry model for ⌺g . The larger
to selection using a simple mean of yields in the two the number of years and locations, the smaller will be
subregions, i.e., w ⫽ (0.5, 0.5)⬘. It should be stressed the magnitude of ⌺e1 , thus increasing the weight for the
that both estimators exploit stratification of the target targeted subregion relative to the other subregions. In
region. Differences in performance are therefore solely the limit as ⌺e1 tends to zero, all weight will be on the
due to the contrasting weighting schemes. Of course, target subregion, no matter what is the assumed struc-
when the relative areas are the same (q ⫽ 0.5), both ture for ⌺g . Also, the smaller the genetic correlation,
methods yield identical results. The ratio R1 /R2 is re- Table 5. Selection for local adaptation. Ratios R2 /R4 and R3 /R4,
ported in Table 4. The more substantial the genotype ⫻ where R2 ⫽ response to selection for genotypic main effect Gi
location interaction in the target region and the less as proposed by Atlin et al. (2000), R3 ⫽ response to selection
equal the relative growing areas, the more pronounced for gir using local BLUP, and R4 ⫽ response to selection based
is the gain from accounting for unequal subregion areas on subregion mean alone. Analysis based on three replica-
tions per trial. p is the proportion of ␴˜ 2GL assigned to ␴ 2GS (see
by global BLUP. Table 2).
Value of ratio R2 /R4 or R3 /R4
Table 4. Selection for global adaptation. Ratio R1/R2, where R1 ⫽
response to selection for gi using global BLUP and R2 ⫽ 1 yr 2 yr
response to selection for genotypic main effect Gi as proposed
by Atlin et al. (2000). Analysis assumes l ⫽ 6 locations per p Ratio l⫽2 l⫽4 l⫽6 l⫽2 l⫽4 l⫽6
subregion, s ⫽ 2 subregions, and three replications per trial. Winter wheat in eastern Canada (␴˜ /␴˜ ⫽ 0.08)
2
GL
2
G
Relative areas a1 ⫽ q and a2 ⫽ 1 ⫺ q. p is the proportion of 0.1 R2 /R4 1.184 1.074 1.034 1.106 1.040 1.017
␴˜ 2GL assigned to ␴ 2GS (see Table 2). R3 /R4 1.185 1.077 1.038 1.108 1.043 1.021
0.3 R2 /R4 1.167 1.059 1.020 1.091 1.027 1.003
Value of ratio R1 /R2
R3 /R4 1.172 1.069 1.033 1.096 1.037 1.017
1 yr 2 yr 0.5 R2 /R4 1.150 1.045 1.006 1.076 1.014 0.990
R3 /R4 1.160 1.062 1.029 1.086 1.031 1.014
p q ⫽ 0.3 q ⫽ 0.1 q ⫽ 0.01 q ⫽ 0.3 q ⫽ 0.1 q ⫽ 0.01 Spring wheat in western Canada (␴˜ 2GL /␴˜ 2G ⫽ 0.37)
Winter wheat in eastern Canada (␴˜ /␴˜ ⫽ 0.08)
2
GL
2
G 0.1 R2 /R4 1.214 1.074 1.023 1.137 1.041 1.005
0.1 1.000 1.000 1.000 1.000 1.000 1.000 R3 /R4 1.222 1.089 1.043 1.146 1.056 1.026
0.3 1.000 1.001 1.001 1.000 1.001 1.001 0.3 R2 /R4 1.139 1.012 0.962 1.070 0.983 0.947
0.5 1.000 1.001 1.002 1.001 1.002 1.003 R3 /R4 1.169 1.061 1.027 1.101 1.034 1.014
0.5 R2 /R4 1.074 0.957 0.909 1.010 0.931 0.896
Spring wheat in western Canada (␴˜ 2GL /␴˜ 2G ⫽ 0.37) R3 /R4 1.128 1.042 1.017 1.070 1.020 1.008
0.1 1.000 1.001 1.001 1.000 1.001 1.002 Spring wheat in Australia (␴˜ 2GL /␴˜ 2G ⫽ 2.67)
0.3 1.002 1.006 1.009 1.002 1.008 1.012
0.5 1.004 1.014 1.021 1.004 1.017 1.026 0.1 R2 /R4 1.111 0.940 0.873 1.111 0.943 0.876
R3 /R4 1.185 1.053 1.017 1.185 1.056 1.019
Spring wheat in Australia (␴˜ 2GL /␴˜ 2G ⫽ 2.67) 0.3 R2 /R4 0.795 0.677 0.628 0.800 0.685 0.635
0.1 1.005 1.020 1.030 1.005 1.019 1.029 R3 /R4 1.036 1.001 1.001 1.039 1.003 1.000
0.3 1.025 1.096 1.141 1.023 1.087 1.128 0.5 R2 /R4 0.622 0.532 0.493 0.629 0.542 0.501
0.5 1.043 1.161 1.233 1.037 1.142 1.207 R3 /R4 1.003 1.004 1.010 1.005 1.001 1.004
1156 CROP SCIENCE, VOL. 45, MAY–JUNE 2005

Table 6. Selection for local adaptation. Standardized weights wⴕ/ into subregions and using growing areas per subregion
(wⴕ1) for local BLUP of gir . Analysis based on three replica- as weights to combine local BLUPs for subregions into
tions per trial. p is the proportion of ␴˜ 2GL assigned to ␴ 2GS (see
Table 2). a global BLUP.
Curnow (1988) and Atlin et al. (2000) have made a
Standardized weights wⴕ/(wⴕ1)
very important contribution in demonstrating that infor-
1 yr 2 yr mation from neighboring subregions may be infor-
p Subregion l⫽2 l⫽4 l⫽6 l⫽2 l⫽4 l⫽6 mative for a targeted subregion. Atlin et al. (2000) con-
Winter wheat in eastern Canada (␴˜ 2GL /␴˜ 2G ⫽ 0.08)
sidered two alternatives, i.e., use all data to estimate the
0.1 Target 0.180 0.355 0.524 0.190 0.370 0.540 global mean, giving equal weight to all subregions, or
Reproduced from Crop Science. Published by Crop Science Society of America. All copyrights reserved.

Other 0.164 0.323 0.476 0.160 0.315 0.460 use only the data from the targeted subregion. They
0.3 Target 0.207 0.395 0.567 0.236 0.436 0.608 showed that the local mean might be outperformed by
Other 0.159 0.302 0.433 0.153 0.282 0.392
0.5 Target 0.233 0.432 0.604 0.279 0.493 0.661 the global mean if genetic correlations are high between
Other 0.153 0.284 0.396 0.144 0.254 0.339 subregions. In these cases, they propose not to subdi-
Spring wheat in western Canada (␴˜ 2GL /␴˜ 2G ⫽ 0.37) vide the target region. Our approach obviates the need
0.1 Target 0.211 0.402 0.574 0.233 0.433 0.605 to choose between these two simple alternatives. The
Other 0.158 0.299 0.426 0.153 0.284 0.395
0.3 Target 0.294 0.513 0.681 0.352 0.579 0.737 method always uses all data, giving different weight to
Other 0.141 0.243 0.319 0.130 0.210 0.263 the targeted subregion and its neighbors, with weights
0.5 Target 0.396 0.600 0.756 0.454 0.682 0.816 depending on the objective of estimation (local or global
Other 0.126 0.200 0.244 0.109 0.159 0.184
adaptation) and on the information provided by each
Spring wheat in Australia (␴˜ 2GL /␴˜ 2G ⫽ 2.67)
0.1 Target 0.347 0.598 0.775 0.347 0.592 0.764
subregion. Also, as opposed to the procedures considered
Other 0.131 0.201 0.225 0.131 0.204 0.236 by Curnow (1988) and Atlin et al. (2000), our method
0.3 Target 0.644 0.942 1.062 0.633 0.900 1.007 will always benefit from a subdivision of the target,
Other 0.071 0.029 ⫺0.062 0.073 0.050 ⫺0.007
0.5 Target 0.875 1.141 1.189 0.847 1.062 1.103 providing there is heterogeneity among subregions and
Other 0.025 ⫺0.071 ⫺0.189 0.031 ⫺0.031 ⫺0.103 variance components are known.
Under the assumed model and providing known vari-
i.e., the larger the diagonal elements of ⌺g in relation to ance components, our weighted estimator will be opti-
the off-diagonal elements, the higher the weights for the mal in the sense that it minimizes the MSEP and maxi-
target subregion. When the genetic correlation is very mizes the expected response to selection. Specifically,
low, as in the Australian wheat data, negative weights local BLUP will give the best estimate of gir and global
may occur for nontarget subregions. This is a result of BLUP will give the best estimate of gi. If sample esti-
the genotype ⫻ year interaction component, which in- mates are plugged in (empirical BLUP), some loss of
troduces a positive environmental covariance in ⌺e1 . The accuracy will result, and optimality can no longer be
negative weights are usually small in absolute value, guaranteed, though near-optimality is usually achieved.
and virtually all the weight lies on the target subregion. It may be a worthwhile strategy to use long-term data
to obtain more accurate variance component estimates.
This gain in accuracy will need to be balanced, however,
DISCUSSION against a potential long-term shift in variance compo-
Plant breeders and extension service personnel fre- nents due to breeding progress as well as advances in
quently consider a subdivision of a target region into agronomic practices. It would be worthwhile to conduct
smaller subregions, which in themselves are more homo- a simulation study on the effect of estimation error in
geneous than the overall target region. Subdivision does the variance parameters on the response to selection.
not necessarily imply, however, that data from one sub- This will be the subject of future research.
region are not informative regarding another targeted When considering the relative merits of local and
subregion. In this paper, we have described a weighting global BLUP, it is useful to distinguish two different
scheme, based on standard mixed model procedures objectives: (i) geographical placement of cultivars and
(BLUP), which allows information from different sub- (ii) selection of entries in a breeding program. Local
regions to be efficiently combined. The weighting ap- BLUP will maximize the gain from selection for local
proach has been shown to be beneficial for two different adaptation. Thus, placement of cultivars should be based
objectives, the one being estimation of performance in on local BLUP, if a meaningful subdivision of the target
a specific subregion, while the other is estimation of the region is available and data permit reliable estimates of
global mean in the target region. For estimating the all variance components. By contrast, it is not so straight-
mean in a targeted subregion, local BLUP combines forward to decide, whether local or global BLUP is pref-
yield data from all subregions in an optimal way. If erable for breeding purposes. While selecting for global
there is little information in the neighboring subregions adaptation may reduce the selection gain in a particular
because of large genotype ⫻ subregion interaction, the subregion, it does have the advantage of addressing a
neighbors will be assigned small weights. Conversely, larger growing area. Therefore, breeding for local adap-
if the information content is high, weights assigned to tation will be worthwhile only if the superiority of local
neighbors will be relatively high. The BLUP procedure response to selection more than compensates the loss
will yield the optimal weights from the variance compo- from a reduction in growing area where the cultivar can
nents in a quasi-automatic fashion. For estimating the be successfully marketed.
global mean, accuracy can be improved by stratification We have shown that global BLUP outperforms the
PIEPHO & MÖHRING: BEST LINEAR UNBIASED PREDICTION FOR SUBDIVIDED TARGET REGIONS 1157

unweighted mean across subregions, when areas per best studied by comprehensive simulation, and this will
subregion are unequal. Assuming compound symmetry be the subject of future work.
and balanced data, the genetic correlation between the Mixed model analysis of multienvironment trials re-
unweighted mean and the true mean effect in the target quires random locations to allow broad inferences with
(gi) is respect to the target region. The requirement of a truly
random sample of locations from the whole target may
m␴ 2G ⫹ ␴ 2GS .
␳g ⫽ 冪m(␴G ⫹ aⴕa␴ GS)
2 2 [25] not always be easy to satisfy in practice, particularly
when locations are selected to be representative. The
notion of representativeness of certain locations usually
Reproduced from Crop Science. Published by Crop Science Society of America. All copyrights reserved.

This will equal unity only when all subregions are of implies a stratification of the target region. Thus, instead
equal size, so that aⴕa ⫽ m⫺1. The more heterogeneous of selecting representative locations, one may subdivide
the growing areas, the smaller the genetic correlation. the target region into (representative) subregions and
This shows why, for unequally sized subregions, the then take truly random samples of locations per sub-
unweighted analysis is suboptimal. region. Breeders may be more comfortable with this
Our conceptual framework assumes that there is a type of restricted random sampling from subregions
larger target region, which may be subdivided into agro- than a fully random sample of locations from the whole
ecologically distinct subregions. The demarcation between target region.
target regions (broad, global) and subregions (narrow, Our approach treats genotypes as a random factor.
local) will depend on the type and scale of breeding ap- Many trial systems are set up to test elite genotypes,
plication. For example, some breeders are part of a multi- which have undergone several cycles of selection at the
national (company or IARC) effort, whereas others are later stages of a breeding program or released cultivars.
part of a national, state or provincial level program. This raises the question concerning the population of
What is a subregion to one is a broad region to another. entries to which results should apply. One possible view
Thus, the breeder will have to decide on a clear defini- is that the population is defined by the potential set
tion of the target region and subregions. In so doing, one of entries that could have been obtained by the same
will need to balance the geographic and climatological breeding programs that generated the entries under con-
context against the organizational context. For a given sideration. The entries in the trials can be considered as
latitudinal zone, there may be a very high degree of simi- a random sample of this hypothetical population. Clearly,
larity between distant locations, but across longitudinal the entries are not a random sample from the genotypes
or elevational distances, closer subregions may rapidly available at the beginning of the breeding process. Un-
become quite dissimilar. Subdivision will be most use- der a random genotypes model, it is also possible to in-
ful if homogeneity within subregions is maximized, i.e., corporate pedigree information, using a model allowing
genotype ⫻ location interaction within subregions is for genetic correlation among related genotypes (Pie-
minimized, while genotype ⫻ subregion interaction is pho and Pillen, 2004). Such models may be useful for
multi-environment testing in complex breeding pro-
maximized. In this case, data from the targeted sub-
grams. Generally, estimates of genetic effects are typi-
region are very informative, while information from
cally more efficient under a random genotypes model
neighboring subregions will be relatively small. Con-
than under a fixed effects model, providing the genetic
versely, if subdivision mainly follows administrative
variance components can be accurately estimated (Pie-
boundaries, the subregions may not be very distinct agro-
pho, 1998; Smith et al., 2001).
ecologically, thus increasing the information content from It is worth mentioning that Atlin et al. (2000) do not
neighboring subregions. Clearly, agroecological factors explicitly use BLUP, and their development is basically
are usually better criteria for subdivision than adminis- in terms of simple genotype means, although they regard
trative boundaries. In either case, optimal weighting of genotypes as random. Under their assumed model and
information from targeted and neighboring subregions balanced data setting, simple means and BLUPs will be
is crucial, and this may be achieved in a convenient way perfectly linearly related, so the selection decision will
by BLUP. be the same with either estimator. With unbalanced data
Performance of our methodology critically depends and other variance–covariance structures, however, the
on good estimates of the variance–covariance structure two estimators will differ, and BLUP will typically be
for genetic effects. Efficiency will be compromised if more efficient than simple means in such circumstances
subdivision leads to subregions with no more than one (Piepho, 1998).
or two test locations, especially when heterogeneity of Our BLUP procedure has two salient statistical fea-
covariance needs to be accounted for. Thus, it is a good tures: shrinkage and weighting. For balanced data, shrink-
strategy to have a larger number of test locations per sub- age toward the mean is the same for each genotype, so
region. The optimal number of locations per subregion shrinkage will not affect ranking compared with alterna-
will depend on a number of factors, such as the magni- tive estimators such as the mean per subregion. Thus,
tude of and relationship among of variance components, the differences in performance we found in comparison
the degree of agroecological differentiation between sub- to other estimators were not due to shrinkage. The dif-
regions, the objective of estimation (global or local ad- ferences were solely due to the weighting scheme. With
aptation), the number of years available for analysis, the unbalanced data, shrinkage will differ among genotypes
design used with individual trials, etc. These factors are and so may affect ranking.
1158 CROP SCIENCE, VOL. 45, MAY–JUNE 2005

In this paper, we mainly focused on balanced data per block effects and spatial modeling (Gilmour et al., 1997;
subregion because this facilitated studying the statistical Smith et al., 2001). The optimal properties of BLUP
properties of our procedure. For the same reason, we con- and the near-optimality of empirical BLUP are retained
sidered a two-step approach, in which genotypic means with these more general and more flexible models.
per subregion are estimated in the first step and are In conclusion, it should be emphasized that our ap-
then submitted to a BLUP procedure in the second step. proach can only improve on the prediction of geno-
In practice, a single-step procedure is more convenient type ⫻ region interaction, while interactions of geno-
and, in fact, easier to implement for routine use. More- type ⫻ year and genotype ⫻ year ⫻ location essentially
over, data will often be unbalanced. A single-step analy- remain unpredictable for any farm in the target region.
Reproduced from Crop Science. Published by Crop Science Society of America. All copyrights reserved.

sis of possibly unbalanced data is straightforward when Unfortunately, these latter effects often dominate total
a good mixed model package is used. All that needs to variation. Thus, while the method proposed in this paper
be done is to fit the mixed model outlined in Eq. [1]
is a small step forward, the problem of unpredictable
and [2] and let the package compute the BLUPs of
interactions related to years largely continues unabated.
genetic effects and linear functions thereof, depending
on the choice of the vector v (see Appendix A).
In the example, we have used the compound symme- APPENDIX A
try structure in Eq. [4] to model the genetic variance-
We present sample code in SAS for single-step local BLUP.
covariance structure ⌺g . The model was used to facilitate
Factors are assumed to be coded as follows: G ⫽ genotype,
comparison with the method of Atlin et al. (2000), which S ⫽ subregion, L ⫽ location, Y ⫽ year. The MIXED code
is based on this same structure. It should be stressed, for the response variable YIELD and the compound symmetry
however, that the compound symmetry model is rather structure for ⌺g is as follows:
restrictive in that variances are assumed to be the same
for all subregions and that covariances are the same for proc mixed;
each pair of subregions. It is now relatively easy to fit class G S L Y;
more general structures including heteroscedastic and
factor-analytic, and experience shows that such models model YIELD ⫽ S;
often fit considerably better than compound symmetry random Y S*L S*Y S*L*Y G*Y G*S*L G*S*Y;
(Piepho, 1998; Smith et al., 2001). When fitting more com-
plex models, one needs to balance increased realism random S/sub ⫽ G type ⫽ CS solution;
against the need to estimate more parameters. In the ex- run;
treme case of an unstructured model, ⌺g has m(m⫹1)/2
parameters, where m is the number of subregions, and Instead of the compound symmetry structure, other models
sample size may not even permit fitting of such complex such as factor-analytic of heteroscedasticity can be used for
models. According to the principle of parsimony, it may ⌺g by appropriately modifying the type ⫽ option in the second
be preferable to fit simpler models, particularly when “random” statement (Piepho, 1999).
the sample size is limited, and there are established pro- To reduce computation time, one may take all effects not
cedures for striking the balance between overly simplis- crossed with genotypes as fixed. This will yield identical results
tic and overly complex models (Piepho, 1999). for balanced data. With unbalanced data, this analysis will
Departure from the compound symmetry structure not make use of intertrial information. Since the intertrial
implies heterogeneity in genetic correlations, which will information is often small (Patterson, 1997), sacrifice in effi-
affect the weights W in the BLUP equation. Specifically, ciency will usually be marginal. The reduction in computing
for estimation of gir in the targeted subregion, the weight time results from taking genotypes as the “subject” for all
of neighboring subregions increases with the genetic effects (Littell et al., 1996).
correlation, while subregions showing low genetic corre- proc mixed;
lation with the targeted subregion are down-weighted.
The dependence of weights on genetic correlations with class G S L Y;
neighboring subregions, especially in models with heter- model YIELD ⫽ S Y S*L S*Y S*L*Y;
ogeneity in the correlations, is both intuitively appealing
and statistically desirable. It is our experience that the random Y S*L S*Y/subject ⫽ G;
proposed method demonstrates its full power when random S/sub ⫽ G type ⫽ CS solution;
models departing from compound symmetry fit the data
well. This is likely to occur when heterogeneity among run;
subregions is pronounced. A detailed account will be If a genotype is missing in a particular subregion, a BLUP
published elsewhere. can still be computed. To do so, the input dataset needs to
While modeling of ⌺g is certainly the most crucial model have at least one record for the subregion ⫻ genotype combi-
selection step, other model components require atten- nation in question, coding the missing observation as a dot.
tion as well. For example, heterogeneity of variance
components pertaining to eijkr in Eq. [1] may be expected
between subregions. Also, one may model replicate data APPENDIX B
instead of trial means. This opens several options for Let yⴕ ⫽ (yⴕ1 , yⴕ2 , ..., yⴕn) and gⴕ ⫽ (gⴕ1 , gⴕ2 , ..., gⴕn), where n is
more refined modeling at the trial level, depending on the number of genotypes. For balanced data (see Eq. [6])
experimental design, including fitting of incomplete we find
PIEPHO & MÖHRING: BEST LINEAR UNBIASED PREDICTION FOR SUBDIVIDED TARGET REGIONS 1159

var(g) ⫽ In 丢 ⌺g , (cⴕ 丢 vⴕ)g̃ o ⫽ (cⴕ 丢 vⴕ)(In 丢 ⌺g)[In 丢 Im ⫺


var(y) ⫽ In 丢 (⌺g ⫹ ⌺e1) ⫹ Jn 丢 ⌺e2 , Jn 丢 ⌺e2(⌺g ⫹ ⌺e1 ⫹ n⌺e2)⫺1
cov(g, y) ⫽ var(g),
E(g) ⫽ 0, and II 丢 (⌺g ⫹ ⌺e1)⫺1][y ⫺ E(y)]
n

⫽ (cⴕ 丢 vⴕ)(In 丢 ⌺g)[In 丢 (⌺g ⫹ ⌺e1)⫺1]


E(y) ⫽ 1 丢 ␮,
[y ⫺ E(y)]
where In is an n-dimensional identity matrix, Jn is an n ⫻ n
matrix of ones everywhere, and 丢 denotes the Kronecker or ⫽ (cⴕ 丢 vⴕ)[In 丢 ⌺g(⌺g ⫹ ⌺e1)⫺1]
Reproduced from Crop Science. Published by Crop Science Society of America. All copyrights reserved.

direct product operator (Searle et al., 1992). The best linear [y ⫺ E(y)]
unbiased predictor of g is (Searle et al. (1992), p. 269)
⫽ (cⴕ 丢 vⴕ)[In 丢 W][y ⫺ E(y)]
BLUP(g) ⫽ g̃o ⫽ W o [y ⫺ E(y)], with W ⫽ ⌺g(⌺g ⫹ ⌺e1)⫺1. The key step in this derivation is
where to note that cⴕJn ⫽ 0n , where 0n is a null vector. It follows
that, for selection purposes, it is sufficient to compute
W o ⫽ var(g)[var(y)]⫺1. g̃ ⫽ [In 丢 W][y ⫺ E(y)].
The selection decision will depend on ranking of genotypes, This is seen to be the estimator in [11].
which in turn is fully determined by all pairwise differences
among genotypes. Any estimator, which always yields the ACKNOWLEDGMENTS
same pairwise differences as BLUP, can be considered essen- We would like to thank the referees, whose comments
tially equivalent to BLUP with regard to the resulting selection helped improve the paper.
decision. An estimate of the pairwise difference for an estima-
ble function of interest can be obtained by multiplication of REFERENCES
BLUP(g) with a contrast vector c 丢 v, where c is an n-dimen-
sional pairwise contrast vector with ci ⫽ 1 and ci⬘ ⫽ ⫺1 for Atlin, G.N., R.J. Baker, K.B. McRae, and X. Lu. 2000. Selection
response in subdivided target regions. Crop Sci. 40:7–13.
the two genotypes to be compared and zeros elsewhere, while Comstock, R.E., and R.H. Moll. 1963. Genotype-environment interac-
v is a coefficient vector for the estimable function of interest. tion. p. 164–194. In W.D. Hanson and H. F. Robinson (ed.) Statisti-
We find cal genetics and plant breeding. Publication 982. National Academy
of Sciences-National Research Council, Washington, DC.
(cⴕ 丢 vⴕ)g̃o ⫽ (cⴕ 丢 vⴕ)W o[y ⫺ E(y)] with Curnow, R.N. 1988. The use of correlated information on treatment
effects when selecting the best treatment. Biometrika 75:287–293.
W o ⫽ (In 丢 ⌺g)[In 丢 (⌺g ⫹ ⌺e1) ⫹ Jn 丢 ⌺e2]⫺1. Falconer, D.S., and T.F.C. Mackay. 2001. Introduction to quantitative
genetics. Pearson Education, Harlow.
It can be shown that
Gilmour, A.R., B.R. Cullis, A.P. Verbyla, and A. C. Gleeson. 1997.
[In 丢 (⌺g ⫹ ⌺e1) ⫹ Jn 丢 ⌺e2]⫺1 ⫽ [In 丢 Im ⫺ Accounting for natural and extraneous variation in the analysis of
field experiments. J.Agric. Biol. Environ. Statist. 2:269–293.
[Jn 丢 ⌺e2(⌺g ⫹ ⌺e1 ⫹ n⌺e2)⫺1][In 丢 (⌺g ⫹ ⌺e1)⫺1] Kackar, R.N., and D.A. Harville. 1984. Approximations for standard
errors of estimators of fixed and random effects in mixed linear
This follows from the easily established fact that models. J. Am. Statist. Assoc. 79:853–862.
Kish, L. 1965. Survey sampling. John Wiley & Sons, New York.
(In 丢 A ⫹ Jn 丢 B)⫺1 ⫽ Littell, R.C., P.R. Henry, and C.B. Ammerman. 1996. Statistical analy-
sis of repeated measures data using SAS procedures. J. Anim. Sci.
[In 丢 Im ⫺ Jn 丢 B(A ⫹ nB)⫺1](In 丢 A⫺1), 76:1216–1231.
Patterson, H.D. 1997. Analysis of series of variety trials. p. 139–161.
where A and B are m ⫻ m matrices. The proof is as follows: In R.A. Kempton, and P.N. Fox (ed.) Statistical methods for plant
variety evaluation. Chapman and Hall, London.
(In 丢 A ⫹ Jn 丢 B)⫺1(In 丢 A ⫹ Jn 丢 B) ⫽ Piepho, H. p. 1998. Empirical best linear unbiased prediction in culti-
[In 丢 Im ⫺ Jn 丢 B(A ⫹ nB)⫺1] var trials using factor analytic variance-covariance structures.
Theor. Appl. Genet, 97:195–201.
(In 丢 A⫺1)(In 丢 A ⫹ Jn 丢 B) ⫽ Piepho, H. p. 1999. Stability analysis using the SAS system. Agron. J.
91:154–160.
[In 丢 Im ⫺ Jn 丢 B(A ⫹ nB)⫺1] Piepho, H.P., and K. Pillen. 2004. Mixed modelling for QTL ⫻ environ-
ment interaction analysis. Euphytica 137:147–153.
(In 丢 Im ⫹ Jn 丢 A⫺1B) ⫽ Portnoy, S. 1982. Maximizing the probability of correctly ordering
In 丢 Im ⫹ Jn 丢 A⫺1B ⫺ Jn 丢 B(A ⫹ nB)⫺1 ⫺ random variables using linear predictors. J. Multivariate Anal. 12:
256–269.
nJn 丢 A⫺1B 2(A ⫹ nB)⫺1 ⫽ Searle, S.R., G. Casella, and C.E. McCulloch. 1992. Variance compo-
nents. John Wiley & Sons, New York.
In 丢 Im ⫹ Jn 丢 (A ⫹ nB)⫺1[A⫺1B(A ⫹ nB) ⫺ Smith, A., B.R. Cullis, and R. Thompson. 2001. Analyzing variety by
B ⫺ nA⫺1B 2] ⫽ environment data using multiplicative mixed models and adjust-
ments for spatial field trend. Biometrics 57:1138–1147.
In 丢 I m Talbot, M. 1997. Resource allocation for selection systems. p. 162–174.
In R. A. Kempton and P. N. Fox (ed.) Statistical methods for plant
Thus, we find variety evaluation. Chapman and Hall, London.

You might also like