K-Fold Cross validation ambiguity?

Question

I am using K-Fold cross validation to test my trained model.But i was amazed that for every K-fold the accuracy is different.For instance if use 5-K fold ,every fold has different accuracy.So which fold should i select?Does averaging of 5 fold results is ultimate choice?.Secondly,why the data-set split ratio(70:30) is different in case of 5 fold and 10 fold cross validation.It should be same.

Split ratios should *not* be the same: they should be 5 x 4:1 for 5-fold and 10 x 9:1 for 10-fold CV. 70:30 is approximately used by 3-fold, where you have 3 x 2 : 1. What is the same for all these CVs is that after all $k$ surrogate models are tested, all $n$ cases you have have been tested exactly once. — cbeleites unhappy with SX, May 24 '17 at 16:26

Upper_Case · Accepted Answer · 2017-05-24T18:38:43.370

I think that you may be misunderstanding how cross-validation works, as well as what it is used to do. Additional detail on your research question and methodology would help.

But TL;DR: the average of the cross-validation accuracy measures (generally mean squared error or successful classification) approximates the accuracy of your model on test data (a held-out sample or new data set). How well it approximates that depends on your model, and whether or not that's even what you're interested in depends on what you are trying to determine with cross-validation.

For how it works, k-fold CV randomly divides your data into k subsections and then tests the model fit using fold k as a validation set against the remaining k-1 folds. This procedure is repeated k times, and since each repetition uses different folds for the validation/test sets it is unsurprising (and normal) that different results are produced.

As to the data split ratios I don't know enough to comment. Some clarification on what you are referring to by "data-set split ratio(70:30)" would be valuable, as well as the ratios you observed when using 5 vs. 10 folds. Are you using a software package to do the data-splitting and cross-validating, and if so, how are you calling it?

@Upper_Case.Thank you for your detail reply.What i am doing is after setting inputs and targets.I apply K-Fold cross validation,then training and testing process starts.At the end i take mean of K-folds to average the classification accuracy.More conveniently have a look at attached script.(https://pastebin.com/rrWsTn2w). It will give you proper idea — Case Msee, May 24 '17 at 02:35
Using cross validation after training is finished isn't unusual at all if the cross validation results are to be used for, well, *validation* purposes. — cbeleites unhappy with SX, May 24 '17 at 16:24
@cbeleites Thank you for the input. I'm mostly used to using CV for choosing tuning parameters so my intuition is a bit narrow here. I've removed the phrase about timing of CV, since my perception isn't too useful here. — Upper_Case, May 24 '17 at 18:38

K-Fold Cross validation ambiguity?

1 Answers1