11

I am using the caret package in R for training of binary SVM classifiers. For reduction of features I am preprocessing with PCA using the built in feature preProc=c("pca") when calling train(). Here are my questions:

  1. How does caret select principal components?
  2. Is there a fixed number of principal components that is selected?
  3. Are principal components selected by some amount of explained variance (e.g. 80%)?
  4. How can I set the number of principal components used for classification?
  5. (I understand that PCA should be part of the outer cross-validation to allow reliable prediction estimates.) Should PCA also be implemented in the inner cross-validation cycle (parameter estimation)?
  6. How does caret implement PCA in the cross-validation?
amoeba
  • 93,463
  • 28
  • 275
  • 317
jokel
  • 2,403
  • 4
  • 32
  • 40
  • Useful information can be found in this post on [PCA and k-fold cross-validation in caret package in R](http://stats.stackexchange.com/questions/46216/pca-and-k-fold-cross-validation-in-caret-package-in-r). – Ekaba Bisong Dec 07 '16 at 20:29

1 Answers1

13

By default, caret keeps the components that explain 95% of the variance.
But you can change it by using the thresh parameter.

# Example
preProcess(training, method = "pca", thresh = 0.8)

You can also set a particular number of components by setting the pcaComp parameter.

# Example
preProcess(training, method = "pca", pcaComp = 7)

If you use both parameters, pcaComp has precedence over thresh.

Please see: https://www.rdocumentation.org/packages/caret/versions/6.0-77/topics/preProcess

Jacques Wainer
  • 5,032
  • 1
  • 20
  • 32