Validation error less than training error -- implications?

Question

I am running a neural net to predict used car prices, sample size is 800. Using both 10-fold cross validation (10 times) and 1/3 holdback (10 times), the $R^2$ for training is about 0.60 and for validation is about 0.68 for all 20 runs. The smallest difference in the 20 runs is training $R^2$ = 0.64 and validation $R^2$ = 0.68, so the training $R^2$ is always less than the validation $R^2$.

I am very used to seeing training $R^2$ bigger than validation $R^2$, which means overfitting. In the past when I have seen training $R^2$ less than validation $R^2$, it has been a transient phenomenon that disappeared when I re-ran the model. This is the first time that I have seen validation $R^2$ systematically larger than training $R^2$.

I have no idea what this means. Any thoughts?

A single large outlier in the training set (and not in any of the CV sets) might cause the larger error. Did you check the data distribution? — Robert Kubrick, Dec 31 '12 at 16:34
Doing 10-fold cv for 10 reps is really not enough. You might want to do the 10-fold cv at least 30 times. Better yet, use bootstrap to get an estimate of your $R^2$. — user765195, Dec 31 '12 at 18:37
I have this same situation occurring frequently with different data sets and many more reps — Hack-R, Feb 28 '16 at 20:20
There might also be different reasons. You could consider this site which I also linked here, which explains it pretty understandable. 1: https://www.pyimagesearch.com/2019/10/14/why-is-my-validation-loss-lower-than-my-training-loss/ - 2: https://www.pyimagesearch.com/2019/10/14/why-is-my-validation-loss-lower-than-my-training-loss/ — Cadoiz, Aug 10 '20 at 03:16

Validation error less than training error -- implications?

0 Answers0

Linked