3

I am running a neural net to predict used car prices, sample size is 800. Using both 10-fold cross validation (10 times) and 1/3 holdback (10 times), the $R^2$ for training is about 0.60 and for validation is about 0.68 for all 20 runs. The smallest difference in the 20 runs is training $R^2$ = 0.64 and validation $R^2$ = 0.68, so the training $R^2$ is always less than the validation $R^2$.

I am very used to seeing training $R^2$ bigger than validation $R^2$, which means overfitting. In the past when I have seen training $R^2$ less than validation $R^2$, it has been a transient phenomenon that disappeared when I re-ran the model. This is the first time that I have seen validation $R^2$ systematically larger than training $R^2$.

I have no idea what this means. Any thoughts?

  • 4
    A single large outlier in the training set (and not in any of the CV sets) might cause the larger error. Did you check the data distribution? – Robert Kubrick Dec 31 '12 at 16:34
  • 1
    Doing 10-fold cv for 10 reps is really not enough. You might want to do the 10-fold cv at least 30 times. Better yet, use bootstrap to get an estimate of your $R^2$. – user765195 Dec 31 '12 at 18:37
  • I have this same situation occurring frequently with different data sets and many more reps – Hack-R Feb 28 '16 at 20:20
  • There might also be different reasons. You could consider this site which I also linked here, which explains it pretty understandable. 1: https://www.pyimagesearch.com/2019/10/14/why-is-my-validation-loss-lower-than-my-training-loss/ - 2: https://www.pyimagesearch.com/2019/10/14/why-is-my-validation-loss-lower-than-my-training-loss/ – Cadoiz Aug 10 '20 at 03:16

0 Answers0