Newish to R and new to CrossValidated. I have a question about the predict method for caret "train" objects.
I'm running a randomForest model using caret package and am trying to produce some simple ROC curves. It was my understanding that predict.train() and predict.randomForest() for the $finalModel part of the same train object should produce the same results. However, the results are very different -- in the example below, accuracy is .992 for values from predict.train() and .438 for values from predict.randomForest().
This is similar to this post: Whether preprocessing is needed before prediction using FinalModel of RandomForest with caret package? but I don't do any preProcessing, and this: Confusion between caret randomForest predict() results and reported model performance, but the difference is significant enough so that I don't think it is a difference in seeds.
Here is some reproducible code:
library(caret)
library(titanic)
Titanic = data.frame(Titanic)
mc = trainControl(method='boot', classProbs=TRUE,returnResamp='final',summaryFunction = defaultSummary)
Titanicmodel <- train(x=Titanic[,-(4)], y=Titanic[,'Survived'],method='rf',trControl=mc, metric='Accuracy')
pred_train <- predict(Titanicmodel, type='raw') #caret predict
prob_train <- predict(Titanicmodel, type='prob') #caret predict
confusion_train <- confusionMatrix(pred_train,Titanic[,'Survived'])
confusion_train
plot(pROC::roc(Titanic[,'Survived'],prob_train[,'Yes']))
pred_final <- predict(Titanicmodel$finalModel, type='response') #randomForest predict
prob_final <- predict(Titanicmodel$finalModel, type='prob') #randomForest predict
confusion_final <- confusionMatrix(pred_final,Titanic[,'Survived'])
confusion_final
plot(pROC::roc(Titanic[,'Survived'],prob_final[,'Yes']))
My confusion probably has something to do with the specific parameters I'm using for train or trainControl, but I'm not sure which one.
Please let me know if there is a post that addresses this that I've missed.