2

In a typical neural network, which way is the common way to add regularization?

Assuming regression task, regression error loss is Mean-squared-error

Then we can have two choice of regularization on weights:

  1. $\lambda$ * $\sum ||W||^2$
  2. $\lambda$ * $\textbf{average} ||W||^2$

I have seen most people use the first option, just being curious to ask.

  • depending on what you mean by average, the two should be equivalent as they differ by the scalar value of the number of samples. – meh Jul 10 '18 at 21:02
  • What are you summing/averaging over? It's not clear from your expressions – user20160 Jul 10 '18 at 21:24
  • Is the difference between the two, the second $\lambda$ will be $\lambda/n$ of the original? If so, im not sure if it really matters much. – Anonymous Emu Jul 10 '18 at 22:14
  • I agree with @AnonymousEmu, it's just a different scale for lambda variable. With average you just reduce value of the lambda implicitly – itdxer Jul 17 '18 at 15:22

1 Answers1

1

Using the average implicitly rescales $\lambda$. This means that choosing the average or the sum isn't really consequential, because whatever the optimal $\lambda$ is on the mean scale has an equivalent choice of $\lambda$ on the sum scale, and vice versa. $$ \begin{align} \lambda \sum_i w_i^2 &= \lambda\sum_iw_i^2 \\ &= {n\lambda} \left[\frac{1}{n}\sum_iw_i^2 \right]\\ \end{align} $$

Sycorax
  • 76,417
  • 20
  • 189
  • 313