There are so many regularization techniques, it's not practical to try out all combinations:
- l1/l2
- max norm
- dropout
- early stopping
- ...
It seems that most people are happy with a combination of dropout + early stopping: are there cases where using other techniques makes sense?
For example, if you want a sparse model you can add in a bit of l1 regularization. Other than that, are there strong arguments in favor of sprinkling in other regularization techniques?
I know about the no-free-lunch theorem, in theory I would have to try out all combinations of regularization techniques, but it's not worth trying if it almost never yields a significant performance boost.