Prediction of independent data with PLS

Question

In Matlab's plsregress function and in many other statistic toolboxes, there is a BETA vector returned that simplyfies the regression problem to(excluding the intercept term in BETA):

Y=X*BETA

In almost all documentations, this BETA vector is used to predict the original data and to calculate residuals from there. This is for the data being used in regression. However, I couldn't find any documentation or method to predict unknowns and using solely the BETA vector for this purpose feels wrong. In other words if I were to calculate Y from a different X what steps should I follow? Is there a clear guide somewhere?

What is the case in cross validation?

Edit: Also in this article http://www.sciencedirect.com/science/article/pii/0003267086800289 the use of BETA is not mentioned for the PLS part.

Months later edit: Now I understand that the source of my confusion is differences between two main algorithms: NIPALS and SIMPLS.

Because, in PCA based methods I am expecting some kind of projection. For example in PCR, the scores which will be used to predict the response are obtained by UnknownData*LoadingsFromTheModel and these scores are used for prediction. PLS, which for me a more complicated PCA based method, that much simplification is counterintuitive. — gunakkoc, May 11 '16 at 21:21
Take PCR. You take your newData and multiply with loadings to get scores of several components. Then you take these scores and multiply with regression weights to get the predicted values. This is mathematically equivalent to taking newData and multiplying it only once by loadings*weights, which is just one beta vector. Does that make sense? — amoeba, May 11 '16 at 21:24
The final question is unclear: could you elaborate on it? All the preceding questions are already answered by your formula $y=x\beta$. That's what the multiple regression model tells you to use for predicting $y$ from any $x$. There are literally *thousands* of clear guides: they are book or book chapters and usually have titles like "introduction to regression." — whuber, May 11 '16 at 21:36
Yes, it makes sense mathematically, but is it statistically sound? On several old forum posts(I couldn't find the links), I remember reading 'that BETA vector is for that data' and the statistical re-weightings are being necessary for predictions. Combining that with the code I have found from Milano Chemometrics and QSAR Research Group which does not use BETA confused me. — gunakkoc, May 11 '16 at 21:55
Here is the very simplified version https://paste.ee/p/qa9QV (fixed the comments on the code) — gunakkoc, May 11 '16 at 21:55
@whuber: while the answer is that yes, beta is what you need for prediction I think it is a valid question and I'd give it a try answering how to get to the "PLS Beta" from the usual terminology/matrices that is used for PLS in chemometrics. — cbeleites unhappy with SX, May 18 '16 at 14:31
Alternatively, Mevik, B.-H. & Wehrens, R. The pls Package: Principal Component and Partial Least Squares Regression in R, Journal of Statistical Software, 18, 1 - 24 (2007) gives a very nice and concise explanation. (which is of course what @amoeba already said, just a bit more readable as it is not restriced to comment format) — cbeleites unhappy with SX, May 18 '16 at 14:38
I agree with @cbeleites and voted to reopen too. Looking forward to her answer. — amoeba, May 18 '16 at 16:21
I was unable to express my question, the part that was counter-intuitive was the way that a deflation on original X matrix in NIPALS model ending up in a single vector/matrix for regression. Now that I go through the whole math and it checks out, everything is crystal clear especially with the SIMPLS algorithm. — gunakkoc, Feb 03 '17 at 13:21

cbeleites unhappy with SX · Answer 1 · 2016-10-03T17:47:25.243

I'm mostly using the papers

Paul Geladi and Bruce R. Kowalski: Partial least-squares regression: a tutorial, Analytica Chimica Acta, 185, 1-17 (1986). DOI: 10.1016/0003-2670(86)80028-9 and
Mevik, B.-H. & Wehrens, R.: The pls Package: Principal Component and Partial Least Squares Regression in R, Journal of Statistical Software, 18, 1 - 24 (2007). DOI: 10.18637/jss.v018.i02 papers

to extend @amoeba's comment into an answer here:

Let's start with the PLS X

$\mathbf X = \mathbf T \mathbf P' + \mathbf E$ and $\mathbf T = \mathbf X \mathbf W'$

and the Y matrices

$\mathbf Y = \mathbf U \mathbf Q' + \mathbf F$

(outer relations)
(take care to construct the weights $\mathbf W'$ and $\mathbf Q'$ so they refer directly to $\mathbf X$ and $\mathbf Y$, not to deflated matrices!)

Regression can then take place on the X and Y scores: $\hat u = t b$ (inner relation), thus

$\mathbf Y = \mathbf T \mathbf B \mathbf Q' + \mathbf E$

$\mathbf{\hat Y} = \mathbf X \mathbf W' \mathbf B \mathbf Q'$

Now, the last three matrices ($\mathbf W' \mathbf B \mathbf Q'$) are all part of the PLS model parameters. We can therefore introduce one matrix $\mathbf B' = \mathbf W' \mathbf B \mathbf Q'$ which gives PLS coefficients in analogy to the usual MLR coefficients and write
$\mathbf{\hat Y} = \mathbf X \mathbf B'$
which is the usual form of a linear regression model.

Your prediction can either use these "shortcut" coefficients, or the 3 steps of calculating

X scores $\mathbf{\hat T} = \mathbf X \mathbf W'$, then
Y scores $\mathbf{\hat U} = \mathbf{\hat T} \mathbf B$, and finally
$\mathbf{\hat Y} = \mathbf{\hat U} \mathbf Q'$

update: this procedure modeling both $\mathbf X$ and $\mathbf Y$ with latent variables and scores is known as PLS2. In contrast, PLS1 models only one dependent variable $\mathbf y$ (or $\mathbf Y^{(n \times 1)}$ at a time so that no Y-scores are obtained. Multiple dependent variates can be modeled by separate PLS1 models -- one per variate.

Whether multiple PLS1 or a single PLS2 model are better depends on the application, e.g. on whether the dependent variates are correlated and whether an underlying structure with few(er) latent variables is expected.

In practice, you also need to take care of centering (standard practice) and possible scaling (less standard practice) of $\mathbf X$ and $\mathbf Y$.

For cross validation, the prediction works exactly the same way as for unknown cases: you fit the model on your training cases and then predict the left out cases like any other unknown case.

(Assuming this is not asking whether shortcut solutions exist to update a PLS model for exchanging one case during leave-one-out cross validation)

Your explanation makes perfect sense. However, in many of the resources such as in http://www.eigenvector.com/evriblog/?p=86 the author is regressing the X scores directly to the Y values rather than Y scores. Is this a matter of my decision? Are there any pros or cons? — gunakkoc, Oct 03 '16 at 12:07
@theGD: the bog entry you link focuses on the $\mathbf X$ side and even though once in a while Barry uses $\mathbf Y$ the formulation uses a a single-column $\mathbf y$ (this is more clearly stated in "The PLS model space revisited" = ref. [4]). A single column doesn't need Y-scores. See also the explanation of the difference between PLS1 (using single column $\mathbf y$ and thus so Y-scores) and PLS2 with Y-scores. With multi-column Y it is your decision whether to build multiple PLS1 models or one PLS2 model. See also e.g. http://www.eigenvector.com/faq/index.php?id=93 — cbeleites unhappy with SX, Oct 03 '16 at 15:43
The first article you refenced is for NIPALS algorithm while the second article uses SIMPLS. In SIMPLS there is no need for Y-Scores in prediction step (both PLS1 and PLS2) and the equation is simply Y = X*R*Q' where R is the weights(not to be confused with W in NIPALS) and Q is the Y loadings. The confusion that I had long time ago was due to the differences in these algorithms. Furthermore, the particular function "plsregress" in MATLAB library uses SIMPLS algorithm. — gunakkoc, Feb 03 '17 at 13:05
Thus, can you please update your answer to indicate that your "shortcut" is available for NIPALS and add a little section for SIMPLS (De Jong, S., 1993. SIMPLS: an alternative approach to partial least squares regression. Chemometrics and Intelligent Laboratory Systems) which is more commonly used in many libraries that I have encountered. IMHO, that update will make things more clear for future readers. Lastly, rather than suggesting an update should I answer my own question? — gunakkoc, Feb 03 '17 at 13:10
@theGD: I'd suggest that you should put your new insights into your own answer. I cannot tell you that much about the shortcuts: IIRC there exist shortcuts for leave-one-out (via hat matrix) but I'd need to read that up: my data comes in clusters (with repeated measurements), so these approaches are not valid for the data I have: no shortcuts for me. Which is a pity because these approaches are connected to analytical expressions for uncertainty... — cbeleites unhappy with SX, Feb 06 '17 at 17:44

Prediction of independent data with PLS

1 Answers1