1

It's said the $\ell_2$-penalty term is based on the $\ell_2$-norm. Indeed, the term often is written as $\lambda \|w\|^2_2$.

Notice, though, that the norm is squared, differing from the $\ell_1$-penaltym which is simply the $\ell_1$-norm. It obviously helps in differentiation of the function, but does it change the interpretation of the penalty term?

Would a regularization term like $\lambda \|w\|_2$ lead to different results or are them equivalent?

How does it generalizes to $\ell_p$ regularization with $1\lt p\lt2$? Take or not take the $p^{th}$-root of the penalty term?

Firebug
  • 15,262
  • 5
  • 60
  • 127
  • @Ben That's a cool question, hadn't seen it before, thanks for the suggestion! I actually closed this one as duplicate because I was quite satisfied with [@bdeonovic's arguments](https://stats.stackexchange.com/a/120140/60613) toward ease of derivation. Now I can also see some arguments about priors on the coefficients would be as efficient. – Firebug Sep 12 '17 at 11:37

0 Answers0