10

The underlying model of PLS is that a given $n \times m$ matrix $X$ and $n$ vector $y$ are related by $$X = T P' + E,$$ $$y = T q' + f,$$ where $T$ is a latent $n \times k$ matrix, and $E, f$ are noise terms (sssuming $X, y$ are centered).

PLS produces estimates of $T, P, q$, and a 'shortcut' vector of regression coefficients, $\hat{\beta}$ such that $y \sim X \hat{\beta}$. I would like to find the distribution of $\hat{\beta}$ under some simplifying assumptions, which should probably include the following:

  1. The model is correct, i.e. $X = T P' + E,y = T q' + f$ for unknown $T, P, q$;
  2. The number of latent factors, $k$, is known, and used in the PLS algorithm;
  3. The actual error terms are i.i.d. zero-mean normal with known variances;

This question is somewhat underdefined because there are scores of variants of 'the' PLS algorithm, but I would accept results for any of them. I would also accept guidance on how to estimate the distribution of $\hat{\beta}$ via e.g. a bootstrap, but perhaps that is a separate question.

shabbychef
  • 10,388
  • 7
  • 50
  • 93

2 Answers2

9

Do you know this article: PLS-regression: a basic tool of chemometrics (PDF)? Deriving SE and CI for the PLS parameters is described in §3.11.

I generally rely on Bootstrap for computing CIs, as suggested in e.g., Abdi, H. Partial least squares regression and projection on latent structure regression (PLS Regression). See also the plspm package, and its accompagnying texbook: PLS Path Modeling with R.

I seem to remember there are theoretical solutions discussed in Tenenhaus M. (1998) La régression PLS: Théorie et pratique (Technip), but I cannot check for now as I don't have the book. For now, there are some useful R packages, like plsRglm.

P.S. I just discovered Nicole Krämer's work, in reference to the plsdof R package.

chl
  • 50,972
  • 18
  • 205
  • 364
2

I discovered a paper by Reiss, et. al., Partial least squares confidence interval calculation for industrial end-of-batch quality prediction, in which appears the quote:

The PLS prediction should be accompanied by an online confidence interval to indicate the accuracy of the prediction. The formulation of the confidence interval for the PLS prediction is an area of study that has not concluded a “gold standard”.

This paper contains a reference to the 'excellent survey of such work', Standard error of prediction for multiway PLS, by Faber and Bro, and a paper by Faber and Kowalski, Propagation of measurement errors for the validation of predictions obtained by principal component regression and partial least squares. I will summarize these results as they become available...

shabbychef
  • 10,388
  • 7
  • 50
  • 93
  • (+1) Good to know, thanks. I should look again in Michel Tenenhaus's work -- I'll let you know if I find sth interesting. – chl Dec 16 '10 at 22:01