0

I've been reading through this question: PCA and k-fold cross-validation in caret package in R . In one of the answers, it was suggested to do PCA within the train function rather than before. However, I tested both methods locally and saw no real difference in performance. What exactly is the logic behind this assumption? This is added just for context:

 getTrainPerf(pca_train_svm)
  TrainAccuracy TrainKappa    method
1     0.8024293  0.5831657 svmLinear
> #PCA right way
> pca_train_svm<-train(Survived~.,data=newtrain_imp,method="svmLinear",trControl=control,
+                      metric=metric,preProcess=c("pca"))
> getTrainPerf(pca_train_svm)
  TrainAccuracy TrainKappa    method
1     0.8036165  0.5863854 svmLinear
NelsonGon
  • 113
  • 9
  • See this https://stats.stackexchange.com/questions/55718 for the general discussion of this issue. – amoeba Nov 26 '18 at 11:00
  • Thanks for the link. I would also love to understand what is really going on that is specific to caret. – NelsonGon Nov 26 '18 at 11:07
  • if it is specific to caret package then the question is more relevant to Stack Overflow which addresses coding, including statistical coding. – ReneBt Nov 27 '18 at 11:24

0 Answers0