3

I head an idea but couldn't find it on the internet. Is it common or beneficial to increase/decrease alpha (as in alpha*L2_norm) for each consecutive layer of a neural net? For example, when we detect edges in images we don't really need to regularize much since maybe small features (lines, circles etc) are fairly common but big features (specific faces) are not.

Mariusz
  • 171
  • 5
  • 1
    Good question. I actually mentioned this possibility in my answer [here](http://stats.stackexchange.com/questions/236259/applying-l1-l2-and-tikhonov-regularization-to-neural-nets-possible-misconcepti), but had no examples. A recent reference [says](http://www.deeplearningbook.org/contents/regularization.html) "In the context of neural networks, it is sometimes desirable to use a separate penalty with a different α coefficient for each layer of the network." But it does not give any examples. – GeoMatt22 Oct 25 '16 at 21:26

0 Answers0