2

I would like to forecast stock index returns with SVM, k-NN, and Neural Networks. In advance I want to select my inputs via kernel PCA (kPCA). Everything is performed in R. For the KPCA I use kernlab.

The data which I preprocess to the KPCA are various lagged input variables, where I want to identify the most explanatory lags through the KPCA. It is in xts format originally but for the KPCA I convert it to matrix.

Is there anybody who can tell me how I find out which of the original variables in the xts data matrix are considered to be most explanatory by the KPCA? Yet I am able to perform KPCA but I cannot interpret any results of it.

I have read the documentary of kernlab, so that did not help.

amoeba
  • 93,463
  • 28
  • 275
  • 317

1 Answers1

1

It is kind of hard to interpret the results of a KPCA.

The PCA, enables you to understand the created variables as a linear combinations of the original predictors. However you loose it when you use KPCA, since the individuals are not expressed in terms of (linear combinations of) features of your data set any longer, but in terms of their "kernelized" inner product with respect to the other elements.

The following paper is a great help in understanding what is going on when doing KPCA. http://pca.narod.ru/scholkopf_kernel.pdf

Here, the author proposes a variable importance score based on SVM : http://www.jmlr.org/papers/volume3/rakotomamonjy03a/rakotomamonjy03a.pdf

RUser4512
  • 9,226
  • 5
  • 29
  • 59
  • +1 for the interesting references. @RUser4512 makes an interesting point about the kernalized inner product. But if you choose a kernel that brings out features of interest in your data ... in other words, a kernel related to some model ... then the Kernal components should make more sense. This is the gist of J.O. Ramsay's functional data analysis. See R's FDA package and associated literature. – Placidia Aug 11 '15 at 14:00