DNN Cannot Stop Overfitting

Question

I am training a DNN (CNN + RNN) for a voice conversion task. Although my train loss can be very low with good performance, I believe I am experiencing massive overfitting. To overcome this, I have already added quite a bit of batch norm and dropout inside the model as well as weight decay — however, the model still continues to overfit a lot. I present some of my loss curves below:

With a weight decay constant of 1e-7:

With a weight decay constant of 1e-2: Note that I noticed that if the weight decay constant is > 1e-4, the model seems to experience underfitting.

I want to know what else can I do to improve this model's generalization. Is it just a matter of more data, or do I need to modify my DNN architecture in some way. I have been struggling with this overfitting problem for some days now, and any insight would be a help.

What makes you think that weight decay of `1e-2` is overfitting? It looks like the training and validation setts have similar MSE, so it appears that this choice of weight decay is having the desired effect. — Sycorax, Jan 15 '21 at 03:06
This [thread](https://stats.stackexchange.com/questions/365778/what-should-i-do-when-my-neural-network-doesnt-generalize-well) provides a number of suggestions for how to address overfitting in neural networks. But as I've remarked in my comment, you may not have an overfitting problem at all, just a misunderstanding about what "overfitting" is. Responding to my comments or editing the post to clarify may allow this question to be reopened. — Sycorax, Jan 15 '21 at 15:52
Hi, thank you for your response. On the weight decay of `1e-2`, I actually believed that the model may be *under*fitting, as you can see the training loss is far larger than the experiment with a weight decay of `1e-7`. — user308258, Jan 16 '21 at 05:10
Why are x-scales so different between dev_loss (=?) and loss? — Michael M, Jan 16 '21 at 08:09

score 0 · Answer 1 · answered Jan 16 '21 at 20:04

I think what you're seeing with the loss of 1e-7 is a textbook case of overfitting. It should be obvious why a larger weight decay value increases the model's training loss. The training and holdout loss having approximately the same value shows that you've found the "goldilocks zone" that balances bias and variance for the model -- this is what you're trying to achieve by using a regularization technique.

DNN Cannot Stop Overfitting

1 Answers1