3

Suppose we have a dataset with $N=100$ observations. We do $K$-fold cross-validation with $K=10$ and $K=100$.

In the first case, the classification decisions are sampled (can I say it like this?) from a multinomial distribution. The variance is $np(1-p) = 10\cdot0.5(1-0.5) = 2.5$.

In the second case, the classification decision are sampled from a binomial distribution. The variance is $np(1-p) = 100\cdot0.5(1-0.5)=25$.

Hence, the variance of the first estimate is $10$ times smaller than the variance of the second estimate.

Is this correct this far? Please comment.

Now, suppose we get accuracies $60\%$ and $70\%$. Is it possible to say the second classifier fared better? I think not, since the variance was high for the second classifier. However, how can I show it mathematically?

mmh
  • 779
  • 3
  • 8
  • 21
bino
  • 31
  • 2
  • ... or should I just see that $sqrt(25)=5$ and therefore $60\%$ vs $70\%$ was caused by something else than choice of $K$ (difference is two times the standard deviation)? – bino May 19 '15 at 18:25
  • cross-validation folds are not mutually independent, therefore the number of "hits" is not binomially distributed. It is one of the drawbacks of cross-validation that it does not allow (straightforwardly) to test for significance. A way out may be provided by the bootstrap confidence intervals on accuracy differences. – A. Donda May 19 '15 at 21:05
  • Why not? Each sample belongs to one fold only. Also, the answer to this question http://stats.stackexchange.com/questions/90902/why-is-leave-one-out-cross-validation-loocv-variance-about-the-mean-estimate-f?rq=1 says that the number of miclassifications (inverse of accuracy; does not matter) is a binomial random variable in LOOCV. I am attempting to compare two models. – bino May 19 '15 at 21:15
  • Yes, but the training data in the different folds overlap, which means the resulting classifiers are not statistically independent. You're right, people often assume binomial distribution, but it is wrong nonetheless – a fact not appreciated by the machine learning community as far as I know. – A. Donda May 19 '15 at 21:17
  • I see. How should I then formulate the high-variance argument? – bino May 19 '15 at 21:22
  • Me and my colleagues are using cross-validated classification accuracy in neuroimaging, and we've found that the resulting sampling distribution can be very weird, in extreme cases even bi-modal. If you want to get a feeling for the variance, I'd say simulation is the way to go. – A. Donda May 19 '15 at 21:27

0 Answers0