Matlab Classification Learner App vs. own code: Why is there a difference between Leave-One-Person-Out and 10-fold Crossvalidation?

Question

I am trying to find a good model to explain my dataset.

The problem is that I want to do leave-one-person-out cross validation which is not available in the Matlab Classification Learner App. So I trained different models (e.g. Tree, SVM, KNN, LDA) using functions like fitctee, fitcsvm, fitcknn, and fitcdiscr.

Following the leave-one-person-out procedure I have found average classification accuracy of about 70% for the best model. However, when I use the App to model the data using 10-Fold cross validation, it has much better accuracy and TPR and TNR about 98%.

This is really confusing that why this is happening! I was wondering if there are some steps I am missing when I do the modeling programmatically. Or is there any way to do what the App does by writing scripts and probably customizing the cross validation scheme to leave-one-person-out?

score 2 · Answer 1 · answered Jul 16 '18 at 13:40

First of all, Leave-One-Out and 10-fold-Crossvalidation do not necessary return comparable results: See for example 10-fold Cross-validation vs leave-one-out cross-validation, I also suggest to learn about the Bias-Variance-tradeoff in model validation, e.g. by starting off with Variance and bias in cross-validation: why does leave-one-out CV have higher variance?.

Regarding the specifics of your question:

When you use as many folds as you have samples in your data set, you force the k-fold-Crossvalidation to become Leave-One-Out (assuming that it is non-stratified crossvalidation).
The documentation of the Matlab Classification Learner App states, that there is a way to export the steps done by the app to code for later reuse. So you could use this as starting point for your own experiments. Quote:

To use the model with new data, or to learn about programmatic classification, you can export the model to the workspace or generate MATLAB® code to recreate the trained model

Thank you @steffen. Yes, I agree they would be different, but this improvement from 70% to around 100% seems a big difference to me. Note that the cross validation I am talking about is different with leave-one-out, since there are more than one sample which belongs to each subject/person. I actually try that too. I exported the best model from the App. I used the same model properties to define a new model to be trained and tested in the leave-one-person-out procedure. The model performance did not improve at all. — Remy, Jul 16 '18 at 16:17

Matlab Classification Learner App vs. own code: Why is there a difference between Leave-One-Person-Out and 10-fold Crossvalidation?

1 Answers1