Selecting optimal set of eigenvectors for Principal Components Regression

Question

I am testing various techniques for dealing with strong multi-collinearity (MC) in a regression problem.

There have been various comparison papers written between competing techniques such as Ridge Regression (RR) and Principal Components Regression (PCR). There seems to be no clear winner though with the best technique seemingly problem specific. However one thing that bothers me about the PCR approach is the somewhat arbitrary way in which one simply excludes the smallest eigenvectors as has been proven in Hadi and Ling even the smallest eigenvector may have strong predictive power while the largest eigenvectors may have none.

"Some Cautionary notes on the use of Principal Components Regression" by Hadi and Ling. (PDF)

They also show that the the SSE can be vastly improved by adding seemingly insignificant eigenvectors.

In their discussion they highlght two papers that try to address this 2nd deficiency--Lott(1973) and Gunst and Mason(1973)--but it has been shown that the Lott technique fails to pick the "correct" eigenvectors in the presence of strong MC, and my problem has strong MC.

Do you know of a paper that can select the optimum set of eigenvalues even in the presence of strong MC? Or more recent papers that compare PCR and RR?

Have you (already) read the relevant section on this in *Elements of Statistical Learning*? — cardinal, Oct 11 '13 at 14:44
Sorry Micheal can't use all the PC's , I'm trying to reduce the multi-collinearity by removing the smallest PC's. Will Look at Elements of statistical learning! — Baz, Oct 11 '13 at 18:11
What about PLS? It is good exactly in case the main source(s) of variation do not contribute predictive power. When Hadi & Ling wrote their paper, PLS was AFAIK still almost exclusively used by (us) chemometricians, and it has a long history for calibration of spectroscopic data which usually have massive multi-collinearity. — cbeleites unhappy with SX, Feb 12 '14 at 21:08

conjectures · Answer 1 · 2017-03-20T15:54:07.997

This is not possible in general. If $\boldsymbol x_i \in R^N$ is a multivariate input and $\boldsymbol y_i$ is a corresponding output. There is no a priori reason why the optimal linear relationship between the $y$s and the $\boldsymbol x$s should be a function of of the first $k$ PCs.

A counter example would be suppose $\boldsymbol e_j$ were the standard basis vectors for $\mathbb R^N$. Suppose the data were constructed as $\boldsymbol x = \sum_{i=1}^N \frac{z_i}{i} \boldsymbol e_i$ where the $z_i$ were standard normals and $y_i = z_i \boldsymbol \beta' \boldsymbol e_N$. In any decent sample size $\boldsymbol e_N$ will not be one of the principal basis vectors for choice of some $k<N$. Hence PCR would throw out the useful information, because it is based on a method that pays no attention to the $y_i$ (i.e. to the things you're trying to regress).

Ridge regression, on the other hand is actually a regression technique.

Selecting optimal set of eigenvectors for Principal Components Regression

1 Answers1

Linked