Cross validation methods and overfitting issue

Question

I'm struggling to prevent overfitting with my model and need some clarification if any one can help.

I started with 5 folds and it didn't seem to help when running hyper parameter tuning.

I switched to 10 folds which I thought might make a difference but alas no, it's hardly effected my overfitting.

I then decided to try 10x10 folds which has helped somewhat but not enough.

Is it wrong to add the min() score into the mean calculation at the end to help lower variance ? i.e the tuning would pick a score with a higher score as a low min() would drag the average down.Would that even help?

Like this [.75,.78,.65,] would be worse than [.75,.78,.69]

How about some sort of calculation including the stdev of each fold to try and bring the values closer to the mean? I.e giving a negative weight to scores with higher stddev?

Is this even a thing ? I've never read about it!!

Also what is an acceptable difference between the test and training score ? I'm using accuracy currently and experience 4-5% variance between the two. Is this normal and am I picking hairs?

score 1 · Answer 1 · answered Feb 08 '20 at 21:06

I switched to 10 folds which I thought might make a difference but alas no, it's hardly effected my overfitting.

With choosing the apparently best set of hyperparameters, choice of $k$ basically shouldn't affect the model.

I then decided to try 10x10 folds which has helped somewhat but not enough.

This is either spurious (see above) or points to the models being unstable.

Is it wrong to add the min() score into the mean calculation at the end to help lower variance ?

You can use the observed spread in your optimization heuristic. Whether it really helps, is a different question (the observed min of very few repetions or folds has very high variance).

It's not going to lower any variance per se, though.

How about some sort of calculation including the stdev of each fold to try and bring the values closer to the mean? I.e giving a negative weight to scores with higher stddev? Is this even a thing ? I've never read about it!!

Yes this is a thing, there is a hyperparameter optimization heuristic known as 1-sd-rule, see e.g. The Elements of Statistical Learning .

Also what is an acceptable difference between the test and training score ? I'm using accuracy currently and experience 4-5% variance between the two. Is this normal and am I picking hairs?

This cannot be answered without a) knowing the actual observed values and your sample size (to calculate whether the difference is significant) and b) application knowledge to judge what is to be expected and what is acceptable.

You may also want to look up proper scoring rules.

Cross validation methods and overfitting issue

1 Answers1