Suppose you are given a medium-sized dataset and you did a KFold validation once. You notice that scores on each old differ noticeably. Which validation type is the most practical?
I thought about continuing to use KFold because if the scores dataset differ on each fold then probably I can do a repeated K-Fold, as I've heard its good when facing a medium-sized dataset.
A noisy estimate of model performance can be frustrating as it may not be clear which result should be used to compare and select a final model to address the problem.
One solution to reduce the noise in the estimated model performance is to increase the k-value. This will reduce the bias in the model’s estimated performance, although it will increase the variance: e.g. tie the result more to the specific dataset used in the evaluation.
An alternate approach is to repeat the k-fold cross-validation process multiple times and report the mean performance across all folds and all repeats. This approach is generally referred to as repeated k-fold cross-validation.
Jason Brownlee on MachineLearningMastery.com
But I was wondering if LOO or Holdout would have been better.
Here is the minimal reproducible code I can compare too:
# evaluate a logistic regression model using repeated k-fold cross-validation
from numpy import mean
from numpy import std
from sklearn.datasets import make_classification
from sklearn.model_selection import RepeatedKFold
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
# create dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)
# prepare the cross-validation procedure
cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)
# create model
model = LogisticRegression()
# evaluate model
scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)
# report performance
print('Accuracy: %.3f (%.3f)' % (mean(scores), std(scores)))