Terbraak 1998
Terbraak 1998
SUMMARY
A simple objective function in terms of undeflated X is derived for the latent variables of multivariate PLS
regression. The objective function fits into the basic framework put forward by Burnham et al. (J. Chemometrics,
10, 31–45 (1996)). We show that PLS and SIMPLS differ in the constraint put on the length of the X-weight
vector. It turns out that PLS does not penalize the length of the part of the weight vector that can be expressed
as a linear combination of the preceding weights, whereas SIMPLS does. By using artificial data sets, it is shown
that it depends on the data which of the two methods explains the larger amount of variance in X and Y. The
objective function framework adds insight to the nature of PLS and SIMPLS and how they relate to other
methods. In addition, we present an implicit deflation algorithm for PLS, explain why PLS and SIMPLS become
equivalent when Y changes from multivarite to univariate, and list some geometrical results that may also prove
useful in the study of other latent variable methods. © 1997 John Wiley & Sons, Ltd.
KEY WORDS PLS; SIMPLS; multivariate regression; latent variable; covariance maximization; objective
function
1. INTRODUCTION
In many applications, data from two different measurement systems are to be related, in which each
measurement system produces data on a large number of variables. To get more insight to the data and
the underlying chemical processes, it is often helpful to reduce the number of variables of each set to
a smaller number of latent variables or factors. This idea is quite natural, because the variables are
often highly correlated. Once variables are aggregated into sums or other linear combinations, the
individual variables become essentially redundant. There are, however, a number of rival latent
variable methods for relating two sets of variables. Choosing a method that is fit for the job at hand
is easy when each method has a clearly delimited domain of application. As a means towards
achieving this aim, Burnham et al.1 developed a theoretical framework for latent variable methods. For
convenience the two sets of variables are indicated by X and Y after the n 3 p and n 3 m data matrices
X and Y in which the values of the p predictors and m responses are stored, with n the number of
samples. Often the relation between X and Y is asymmetric in that Y is to be predicted from X, but
the framework also contains methods for cases in which the relation is symmetric. A unifying
framework highlights both the similarities and dissimilarities between different methods.
In the basic framework the latent variables are extracted on the basis of maximization of the
covariance between linear combinations of the columns of X and linear combinations of the columns
of Y, subject to constraints on the weight vectors that define the linear combinations. Canonical
correlation-based regression (CCR), reduced rank regression (RRR) and simple partial least squares
regression (SIMPLS) easily fit into this basic framework, but standard partial least squares regression
(PLS) does not, except for the extraction of the first factor (or when Y is unidimensional). As in
Reference 1, the abbreviation PLS will be used throughout to mean PLS regression for the multivariate
case.2 The problem with PLS is that the second, third and later factors are extracted from residual
matrices that are obtained from X and Y by successive deflation with respect to the preceding X-factor
scores.
After a careful study of the deflation process, Burnham et al.1 extended the basic framework to
incorporate all mentioned methods. The objective function of the extended framework is, however,
rather complicated. The constraints remain simple. Burnham et al.1 question the rationale for the
deflation process in PLS in an appendix and emphasize their point of view by proposing undeflated
PLS (UPLS), but ‘more as an illustration of the impact of the deflation process on PLS than as any
real improvement over existing methods’. UPLS is equivalent to a singular value decomposition of
XTY. L’histoire se repète. SIMPLS was also the result of an attempt to better understand PLS. The
results of PLS and SIMPLS are often very similar1, 3 and are identical if Y is univariate.3 SIMPLS and
PLS thus have approximately the same predictive power. Both numerically and theoretically, SIMPLS
has advantages over standard PLS: the algorithms for SIMPLS are more efficient than the algorithms
for PLS, and the objective function for SIMPLS is more elegant and easier to understand than the
objective function for PLS. The question is therefore whether PLS is better replaced by SIMPLS.
As Burnham et al. write, ‘two papers which claim to have an objective function for PLS beyond the
first pair of latent variables are actually describing SIMPLS’. This has no consequence for univariate
y4–6 but has, at least potentially, consequences for multivariate Y.7 For ecological applications one of
us attempted to extend PLS in the correspondence analysis (CA) framework (Appendix of Reference
8) but used the wrong definitions for PLS. Compared with SIMPLS, one extra constraint was proposed
in which the Y-factors were constrained to be orthogonal to the preceding X-factor scores. This
constraint holds in reduced rank regression and canonical correlation analysis but not necessarily in
PLS or SIMPLS. The misunderstanding was, at least partially, due to the fact that algorithms that are
efficient for ecological data exploit the sparseness of the X- and Y-matrices. Because deflation
destroys the sparseness, iterative algorithms have been developed in which deflation is achieved
implicitly by orthogonalizing both the X-factor scores and the Y-factor scores with respect to the
preceding X-factors (Appendix I). In the algorithm of Appendix I, PLS works directly on the original
data matrices X and Y. It therefore comes as a surprise that the objective function of PLS cannot be
formulated in a simple way in terms of the original data matrices.
In this paper a simple objective function for the latent variables of PLS is derived which fits into
the basic framework of Burnham et al. We show that PLS and SIMPLS differ in the constraint put on
the length of the X-weight vector and resolve some of the questions that Burnham et al. express about
the deflation process in PLS. The paper is organized as follows. In Section 2 we summarize standard
PLS and prepare for the derivation of the objective function of PLS in terms of the original matrices.
In Section 3 we discuss the objective function frameworks of Burnham et al. and show how a slight
generalization allows both standard PLS and SIMPLS to fit naturally in the same framework. Having
found how PLS can be posed as a constrained maximization problem, avoiding deflated X-matrices,
we proceed in Section 4 to discuss the subtle distinction between SIMPLS and PLS. Some specific
aspects are diverted to appendices.
© 1998 John Wiley & Sons, Ltd. J. Chemometrics, Vol. 12, 41–54 (1998)
OBJECTIVE FUNCTION OF PLS 43
We use the notation that A 2 is a generalized inverse of the matrix A and A + is the unique Moore–
Penrose inverse.9 If A has full column rank, A + = (ATA) 2 1AT. Note that AA + is the orthogonal projector
on the column space of A, which reduces to the more familiar form A(ATA) 2 1AT if A has full column
rank. With In we denote the identity matrix of order n. We assume throughout that the predictors and
response variables in X and Y are centred in a preprocessing step, because this allows us to interpret
inner products as covariances (up to a constant factor).
The latent variables (factors) of PLS are derived sequentially. This section describes how the ith pair
of latent variables is derived from the data matrices X and Y and the preceding i 2 1 latent variables.
PLS is a regression method. Thus we concentrate on how the latent X-space is constructed. This latent
space is spanned by orthogonal factors, t1 , t2 , . . . , ti which are linear combinations of the X-variables.
At this point we are not yet concerned about how the optimal weight vectors are selected. We do not
even require at this point that the X-factors are orthogonal. Let X1 = X and Y1 = Y and let Xj and Yj
( j=2, 3, . . .) denote the residual (or deflated) matrices that are obtained from X and Y by successive
deflation with respect to the preceding X-factor scores t1 , t2 , . . . , tj 2 1 , with j indicating the number
of the factor to be extracted.1, 2 The X-factor tj can be defined in terms of either the original matrix X
or the deflated matrix Xj by
tj=X rj=Xj wj , j=1, 2, . . . , i (1)
Each factor has therefore two associated weight vectors (rj and wj respectively), which coincide for
j=1 because X1 = X. The results for the first i 2 1 factors are collected in the matrix T = [t1 , t2 , . . . ,
ti 2 1 ], with i > 2. The corresponding weights {rj } and {wj } ( j=1, 2, . . . , i 2 1) are collected
analogously in the matrices R and W. With this notation, for i > 2, T = XR and
P = XTT(TTT) 2 1 = XTXR(RTXTXR) 2 1 (2)
where the columns of P contain the X-loadings. With (2),
Xi=X 2 TPT = X 2 XRPT = X(Ip 2 RPT ) (3)
but also
Xi=X 2 TPT = X 2 T(TTT) 2 1TTX = (Ip 2 TT + )X (4)
i.e. Xi is the matrix of residuals from the regression of the X-variables on the preceding X-factors in
T. The Y-matrix is deflated analogously with respect to T, i.e. Yi=(Ip 2 TT + )Y.
It is now easy to formulate the objective function of PLS regression in terms of the deflated X- and
Y-matrices: for the ith pair of latent variables, PLS selects linear combinations of the deflated X-
variables and deflated Y-variables, t = Xiw and u=Yic respectively, that have maximum covariance:
tTu = wTXTi Yic subject to wTw = cTc = 1 (5)
The optimal X-weight vector wi is the first eigenvector of the eigenvalue problem
XTi YiYTi Xiw = liw (6)
T
and the optimal Y-weight vector ci is proportional to Yi Xiwi . Only one of either X or Y needs to be
deflated,10 because XTi Yi=XT(Ip 2 TT + )(Ip 2 TT + )Y = XT(Ip 2 TT + )Y, which is equal to both XTi Y and
XTYi . From (6) and on deflating X only, wi also maximizes
wTXTi Y YTXi w subject to wTw = 1 (7)
The weight vectors {wj } derived in this way are mutually orthogonal2 and so are the X-factors {tj }
( j=1, 2, . . . , i). From (1) and (3),
© 1998 John Wiley & Sons, Ltd. J. Chemometrics, Vol. 12, 41–54 (1998)
44 C. J. F. TER BRAAK AND S. DE JONG
max aTi XTY bi subject to aTi M1 ai=1, bTi M2bi=1 and aTj M3 ai=0, 1 < j<i (15)
ai,bi
max aTi XTY M22 YTX ai subject to aTi M1 ai=1 and Ci ai=0i 2 1 (16)
ai
where the last constraint is dropped when i=1. This can be seen by applying the Lagrange multiplier
© 1998 John Wiley & Sons, Ltd. J. Chemometrics, Vol. 12, 41–54 (1998)
OBJECTIVE FUNCTION OF PLS 45
method9 to (15). Zhu and Barnes13 and Hinkle and Rayens14 applied the Lagrange method to the
objective function of SIMPLS, reproducing the solution found by de Jong.3 The Lagrange method
provides the conditions for stationary points of (15). These conditions are necessary for the maximizer
of (15). The conditions obtained by the Lagrange method are
XTY bi 2 l*1 M1 ai 2 CTi m* = 0p and l*2 M2bi 2 YTX ai=0m (17)
where l*1, l*2 and m* are the Lagrange multipliers, one for each constraint. On solving for bi from the
second equation in (17) and inserting the solution in the first, we obtain, with li;l*1l*2 and
m;l*2m*,
XTY M22 YTX ai 2 li M1 ai 2 CTi m = 0p (18)
This equation is also obtained by applying the Lagrange method directly to framework (16). By
premultiplying (18) by a projector Ip 2 P that satisfies the equations
(Ip 2 P)CTi =Op 3 (i 2 1) and (Ip 2 P)M1 ai=M1 ai (19)
the equation can be written as the generalized eigenvalue problem
(Ip 2 P)XTY M22 YTX ai=liM1 ai (20)
The eigenvectors of (20) are stationary points of (16) with the eigenvalues as the corresponding values
of the maximand. The first eigenvector is therefore the global maximizer of (16). With bi solved from
(17) and given the proper length, i.e.
bi=M22 YTX ai /(aTi XTY M22 YTX ai )1/2 (21)
the global maximizers of (15) are obtained. By comparing (20) with (14) and considering the
requirements in (19), we derive that PLS fits in framework (16), with ai=r, by defining
M1 = (Ip 2 RR + ), M2 = Im , P = PRT and Ci=PT (22)
With these definitions the requirements of (19) read
(Ip 2 PRT )P = Op 3 (i 2 1) and (Ip 2 PRT ) (Ip 2 RR + ) ai=(Ip 2 RR + ) ai (23)
The requirements of (23) hold true, as follows from (35) and (44).
With Ci=PT the last constraint in (16), Ciai=0i 2 1 , is equivalent to the last set of constraints of
Framework A, aTj M3ai=0 (1 < j<i), by defining M3 = XTX. See the definition of P in (2). With the other
definitions of (22) this demonstrates that PLS fits in Framework A. Because M3 = XTX, PLS fits in both
Frameworks 2 and 4 of Burnham et al. when M1 is allowed to be singular. Table 1 copies Table 3 from
Burnham et al. with a column for PLS added. For clarity, Table 1 also specifies the matrices Ci and
P used in each method.
Before closing this section, we have two more observations on the framework and its solution. The
first observation is on the number of eigenvalue problems that need to be solved. Equation (20)
specifies that each latent variable is derived as the first eigenvector of a new eigenvalue problem. The
eigenvalue problems solved for different latent variables differ, because the matrix P depends, via P
and R, on the latent variables preceding any particular latent variable. This is an essential feature of
PLS and SIMPLS. However, if M1 = M3 , as in the other methods of Table 1, we need to solve a single
eigenvalue problem only. The second, third and higher latent variables are then the second, third and
higher eigenvectors of the eigenvalue problem solved for the first latent variable. This happens
because these higher eigenvectors automatically satisfy the orthogonality constraints aTj M3ai=0 (i≠j)
if M3 = M1 . In this case the ith X-factor ti=Xai is orthogonal to the jth Y-factor uj=Ybj and bTi M2bj=0
(i≠j). The second observation is that if XTX = Ip , as in some designed experiments, all methods of
© 1998 John Wiley & Sons, Ltd. J. Chemometrics, Vol. 12, 41–54 (1998)
46 C. J. F. TER BRAAK AND S. DE JONG
M1 XTX XTX Ip 2 RR + Ip Ip
M2 YTY Im Im Im Im
M3 XTX XTX XTX XTX Ip
CiT P P P P R
P PRT PRT PRT PP + RR +
Table 1, except CCR, coincide. This is immediate from the definitions in Table 1, except for PLS. The
equivalence of PLS and SIMPLS in this particular case follows from the observation that if XTX = Ip ,
then PTai=RTai=0i 2 1 and aTi (Ip 2 RR + )ai=aTi ai=1.
Table 2. Artificial example data sets (X, Y1) and (X, Y2)
X Y1 Y2
24 22 1 4 0 2 0
24 2 21 24 21 22 21
4 22 21 22 1 21 1
4 2 1 2 0 1 0
© 1998 John Wiley & Sons, Ltd. J. Chemometrics, Vol. 12, 41–54 (1998)
OBJECTIVE FUNCTION OF PLS 47
may increase, because R contains more columns and thus allows more freedom when more factors
have been extracted.
The interpretation of (26) is that PLS does not penalize the length of Rd, i.e. the part of the weight
vector r that can be expressed as a linear combination of the preceding weights. Does this make sense?
In the predictive model for Y based on i latent variables, the i weight vectors are freely combined in
different linear combinations to form the final regression coefficient matrix B of the model Ŷ = XB.
This can be seen as follows. From the regression of Y on T, giving Ŷ = TQ̃T, with Q̃ = YTT(TTT) 2 1 the
matrix of Y-loadings with respect to T, we obtain Ŷ = TQ̃T = XRQ̃T = XB, so that B = RQ̃T. No
constraint is applied while estimating Q̃, because Q̃ is obtained by unrestricted least squares regression
of Y on T. This holds true for both PLS and SIMPLS. Because the predictive model using i 2 1 latent
variables uses unconstrained combinations of the first i 2 1 weight vectors in R, we argue that the ith
Table 3. Main results (X-weights and X-scores) for PLS and SIMPLS regression
of (X, Y1) data from Table 2
PLS SIMPLS
1 2 3 1 2 3
© 1998 John Wiley & Sons, Ltd. J. Chemometrics, Vol. 12, 41–54 (1998)
48 C. J. F. TER BRAAK AND S. DE JONG
weight vector may also use the preceding weight vectors in an unconstrained fashion. From this point
of view there is no reason for penalizing the length of the part of the weight vector that can be
expressed as a linear combination of the preceding weights.
Table 1 shows that PLS uses the same matrix for P as CCR and RRR, namely PRT, the projector
on P in the metric (XTX) 2 , whereas SIMPLS uses the orthogonal projector on P. In the dual space of
objects, Rn, PLS uses the orthogonal projector on T, whereas SIMPLS uses an oblique projector,
namely the projector on XXTT in the metric (XXT) 2 . Least squares regression problems use the
orthogonal projector in Rn. It is unclear to us how to value these geometrical differences between PLS
and SIMPLS.
Because PLS constrains the X-weight vector less than SIMPLS, one might conjecture that PLS
explains more of the Y-variance than SIMPLS. We first approach this conjecture mathematically and
then prove the conjecture to be false by giving a counter-example. The additional Y-variance explained
by an orthogonal factor t = Xr is
rTXTYYTXr/(tTt) (27)
where the numerator is equal to the objective function (16) used by PLS and SIMPLS (M2 = Im ). The
maximum of (16) attained is the eigenvalue l. One can prove that lN2 > lS2, but this relation does not
necessarily imply that PLS explains more Y-variance than SIMPLS with two factors, because of the
denominator of (27). We now present an example and a counter-example of the conjecture. For the
simple data examples given in Table 2, we find that in one case (Y1) two-factor PLS, compared with
SIMPLS, explains more variance in the Y-data and less in the X-data, whereas in the other case (Y2)
these roles are reversed (Table 4). These examples show that the conjecture is false; it depends on the
data which method explains the larger amount of variance in X or Y.
With real data we do not readily observe differences of a few per cent. With artificial data one may
generate even much larger differences than those of Table 4. These occur when at some stage the
deflated XTY has almost coinciding singular values. The associated pair of factors may then enter the
model in reversed order for one method compared with the other. Hence for a given dimensionality
the two models may differ greatly, a difference that may disappear largely after introducing the next
factor.
Because the objective function framework applies irrespective of the dimension of Y, the objective
functions of PLS and SIMPLS also differ when Y is univarite. Nevertheless, PLS and SIMPLS are
known to be equivalent when Y is univariate.3 We resolve this apparent paradox in Appendix III and
also explain why PLS and SIMPLS become equivalent when Y changes from multivariate to
univariate.
Y1 regressed Y2 regressed
on X on X
Method X Y1 X Y2
© 1998 John Wiley & Sons, Ltd. J. Chemometrics, Vol. 12, 41–54 (1998)
OBJECTIVE FUNCTION OF PLS 49
5. CONCLUSIONS
Burnam et al.1 were the first to develop an objective function for all latent variables of multivariate
PLS regression that did not involve deflated matrices. This allowed PLS to be placed in an objective
function framework among its rival methods such as SIMPLS, undeflated PLS, reduced rank
regression and canonical correlation-based regression. That framework was a rather complicated
adaptation of a basic framework so as to accommodate PLS alongside its rivals. In the basic
framework, latent variables are simply chosen by constrained maximization of the covariance between
the linear combinations of the original variables. Using the Lagrange multiplier method, we show in
this paper that PLS can also be placed in the basic framework. This makes it easier to compare and
contrast PLS with the other multivariate methods. Table 1 lists the defining constraints of each
method.
We use the framework to highlight the similarities and dissimilarities between PLS and SIMPLS.
We show that PLS and SIMPLS differ in the constraint put on the length of the X-weight vector. PLS
does not penalize the length of the part of the weight vector that can be expressed as a linear
combination of the preceding weights, whereas SIMPLS does. In the predictive model for Y based on
A latent variables, the A weight vectors are freely combined in different linear combinations for
different Y-variables to form the matrix of final regression coefficients B of the model Ŷ = XB. This
holds true for both PLS and SIMPLS. Because the predictive model using A latent variables already
uses unconstrained combinations of the first A weight vectors, there is little basis to constrain their
usage in forming the (A+1)th vector. These results add to the understanding of the deflation process
in PLS.
Because PLS constrains the X-weight vector less than SIMPLS, one might conjecture that PLS
explains more of the Y-variance than SIMPLS. However, this conjecture does not hold true, as we
prove by giving a counter-example. With two artificial example data sets we demonstrate that it
depends on the data which of the two methods explains the larger amount of variance in X or Y. Like
Burnham et al.,1 we do not readily observe large differences between PLS and SIMPLS with real data.
In our experience the subtle theoretical differences between PLS and SIMPLS do not have important
consequences in practical applications.
We hope that the objective function framework and the associated geometrical results in Appendix
II contribute to a better understanding of the nature of PLS and SIMPLS and their relation to other
latent variable methods.
© 1998 John Wiley & Sons, Ltd. J. Chemometrics, Vol. 12, 41–54 (1998)
50 C. J. F. TER BRAAK AND S. DE JONG
On inserting Step 5 in Step 6 and then, sequentially, Steps 4, 3, 2 and 1 in the result, and on accounting
for the rescaling of w in Step 7 by l = norm(w 2 w0) = iw 2 w0i, we obtain
Upon convergence, l is the eigenvalue. The weight vectors obtained from the algorithm are thus in
terms of the deflated matrices and therefore denoted by w and c as in the main text. Beyond the first
dimension the Y-factor obtained from the algorithm is u = Yic, which is orthogonal to the preceding X-
factors in T. The Y-factor defined in Framework A in (15), u* = Ybi with bi defined by (21), is in
general not orthogonal to T. In contrast with u*, u is not in the column space of Y.1 The Y-factors are
related by u ~ (In 2 TT + )u*.
© 1998 John Wiley & Sons, Ltd. J. Chemometrics, Vol. 12, 41–54 (1998)
OBJECTIVE FUNCTION OF PLS 51
© 1998 John Wiley & Sons, Ltd. J. Chemometrics, Vol. 12, 41–54 (1998)
52 C. J. F. TER BRAAK AND S. DE JONG
From the equality (33) or, alternatively, from (35) and the definition of P + and R + it is now easy
to derive that
(Ip 2 RPT)(Ip 2 RR + ) = (Ip 2 RPT ) (39)
(Ip 2 RR + )(Ip 2 RPT ) = (Ip 2 RR + ) (40)
(Ip 2 PR )(Ip 2 PP ) = (Ip 2 PR )
T + T
(41)
(Ip 2 PP + )(Ip 2 PRT ) = (Ip 2 PP + ) (42)
On transposing both sides of (39)–(42), we obtain
(Ip 2 RR + )(Ip 2 PRT ) = (Ip 2 PRT ) (43)
(Ip 2 PR )(Ip 2 RR ) = (Ip 2 RR )
T + +
(44)
(Ip 2 PP + )(Ip 2 RPT ) = (Ip 2 RPT ) (45)
(Ip 2 RP )(Ip 2 PP ) = (Ip 2 PP )
T + +
(46)
respectively, because RR + and PP + are symmetric. From (43) and (41)
(Ip 2 PRT ) = (Ip 2 RR + )(Ip 2 PRT ) = (Ip 2 RR + )(Ip 2 PRT )(Ip 2 PP + ) (47)
and from (44) and (40)
(Ip 2 RR + ) = (Ip 2 PRT )(Ip 2 RR + ) = (Ip 2 PRT )(Ip 2 RR + )(Ip 2 RPT ) (48)
Dividing both maximands by iXTyi2 does not affect the solution of the maximization problem, hence
we may replace XTy by r1 = XTy/iXTyi. We are also free to choose unit-length r, since neither criterion
depends on the length of r. Notice that this choice differs from the treatment in Section 3, where we
added the constraint rTM1r = 1 rather than absorbing it in the maximand. Finally, we prove below that
rTRR + r = kirTr1rT1 r for some constant ki>0 that is independent of r. This result is immediate for the
second factor with k2=1, because r1r1+ = r1(rT1 r1) 2 1rT1 = r1rT1. Applying these substitutions and
simplifying gives
© 1998 John Wiley & Sons, Ltd. J. Chemometrics, Vol. 12, 41–54 (1998)
OBJECTIVE FUNCTION OF PLS 53
We observe that the maximizing solutions for PLS and SIMPLS coincide, because the minimum of the
denominator in the PLS criterion coincides with the maximum of the numerator as both ki and the
denominator are positive. Thus the normalized weight vectors defining the ith factor of PLS and of
SIMPLS are identical.
It remains to prove that rTRR + r = kirTr1rT1r. Define R* = [r1 , p1 , p2 , . . . , pi 2 2 ]. From the Krylov
series properties of the PLS weight vectors and the loading vectors, especially the fact that the jth term
of the latter series, (XTX)jXTy, corresponds to the ( j+1)st term of the former series,3 it follows that R*
and R have the same column space. Because PTr = 0i 2 1 , RT*r = (rT1r, 0, . . . , 0)T. We obtain
rTRR + r = rTR*(RT*R* ) 2 1RT*r = kirTr1rT1r (53)
T 21
where ki is the (1,1)th element of (R R* ) , which is indeed positive and independent of r. This
*
concludes the proof.
For multivariate Y we may write the optimization problem for the second factor for the two methods
as
where S ; XTY. Let the SIMPLS criterion (55) be maximum for r = rS2 . This corresponds also to the
maximum of the numerator in the PLS criterion (54). There is no reason, however, that it coincides
with the minimum of the denominator. The value of the PLS criterion can be increased, departing from
r = rS2 , when the denominator is further reduced at the expense of a smaller (relative) reduction of the
numerator. Thus, for multivariate Y, PLS does usually not coincide with SIMPLS beyond the first
factor. An exception is when S is of rank one, i.e. when SST can be written as ssT for some vector s.
In that special case one may use the same reasoning as for univariate Y, showing PLS and SIMPLS
to be equivalent.
REFERENCES
1. A. J. Burnham, R. Viveros and J. F. MacGregor, J. Chemometrics, 10, 31–45 (1996).
2. A. Höskuldsson, J. Chemometrics, 2, 211–228 (1988).
3. S. de Jong, Chemometrics Intell. Lab. Syst. 18, 251–263 (1993).
4. M. Stone and R. J. Brooks, J. R. Stat. Soc. B, 52, 237–269 (1990).
5. I. E. Frank and J. H. Friedman, Technometrics, 35, 109–135 (1993).
6. C. J. F. ter Braak, S. Juggins, H. J. B. Birks and H. van der Voet, in Multivariate Environmental Statistics,
ed. by G. P. Patil and C. R. Rao, pp. 525–560, North-Holland, Amsterdam (1993).
7. R. Brooks and M. Stone, J. Am. Stat. Assoc. 89, 1374–1377 (1994).
8. C. J. F. ter Braak and P. F. M. Verdonschot, Aquatic Sci. 57, 255–289 (1995).
9. J. R. Magnus and H. Neudecker, Matrix Differential Calculus with Applications in Statistics and
Econometrics, pp. 32–33, 131–144, Wiley, New York (1988).
10. B. S. Dayal and J. F. MacGregor, J. Chemometrics, 11, 73–85 (1997).
11. A. Höskuldsson, Chemometrics Intell. Lab. Syst. 14, 139–153 (1992).
12. C. R. Rao, Linear Statistical Inference and Its Application, 2nd edn, p. 50, Wiley, New York (1973).
13. E. Zhu and R. M. Barnes, J. Chemometrics, 9, 363–372 (1995).
14. J. Hinkle and W. Rayens, Chemometrics Intell. Lab. Syst. 30, 159–172 (1995).
15. M. C. Denham, Ph.D. Thesis, University of Liverpool (1991).
16. S. de Jong and C. J. F. ter Braak, J. Chemometrics, 8, 169–174 (1994).
17. A. R. Gourlay and G. A. Watson, Computational Methods for Matrix Eigenproblems, Wiley, New York
(1973).
18. M. O. Hill, DECORANA—A FORTRAN Program for Detrended Correspondence Analysis and Reciprocal
© 1998 John Wiley & Sons, Ltd. J. Chemometrics, Vol. 12, 41–54 (1998)
54 C. J. F. TER BRAAK AND S. DE JONG
© 1998 John Wiley & Sons, Ltd. J. Chemometrics, Vol. 12, 41–54 (1998)