This post follows this one: Why does ridge estimate become better than OLS by adding a constant to the diagonal?
Here is my question:
As far as I know, ridge regularization uses a $\ell_2$-norm (euclidean distance). But why do we use the square of this norm ? (a direct application of $\ell_2$ would result with the square root of the sum of beta squared).
As a comparison, we don't do this for the LASSO, which uses a $\ell_1$-norm to regularize. But here it's the "real" $\ell_1$ norm (just sum of the square of the beta absolute values, and not square of this sum).
Can someone help me to clarify?