Why cant Ridge Regression benift from negative lamda?

Question

in Rigid regression, we generally set a positive Lambda for regularization to get a less Residual. Why cant we have a negative Lambda in a regularization if we can benefit from it?

We do not set a positive $\lambda$ to get smaller residuals; any positive $\lambda$ will produce larger residuals than setting $\lambda = 0$. — jbowman, Nov 17 '19 at 03:05
Also see [understanding negative ridge regression](https://stats.stackexchange.com/questions/331264/understanding-negative-ridge-regression) — user20160, Nov 17 '19 at 03:14

score 5 · Accepted Answer · answered Nov 17 '19 at 02:22

There are two (equivalent) formulations of ridge regression (I mention both because I'm not sure which version you're referring to). If $\mathbf{X} \in \mathbb{R}^{n \times p}$ is the design matrix and $\mathbf{y} \in \mathbb{R}^n$ is the target vector, then the two formulations are the following ($\|\cdot\|_2$ denotes the $L^2$ norm).

The constrained optimization form: $$ \hat{\boldsymbol{\theta}}_{\text{ridge}} = \operatorname*{arg\,min}_{\substack{\boldsymbol{\theta} \in \mathbb{R}^p \\ \|\boldsymbol{\theta}\|_2 \leq \lambda}} \left\|\mathbf{X}\boldsymbol{\theta} - \mathbf{y}\right\|_2^2. $$ Here, if $\lambda < 0$, then the constraint $\|\boldsymbol{\theta}\|_2 \leq \lambda$ never holds, so the optimization problem is ill-defined.
The Lagrangian form: $$ \hat{\boldsymbol{\theta}}_{\text{ridge}} = \operatorname*{arg\,min}_{\boldsymbol{\theta} \in \mathbb{R}^p} \left( \left\|\mathbf{X}\boldsymbol{\theta} - \mathbf{y}\right\|_2^2 + \lambda \|\boldsymbol{\theta}\|_2^2\right) $$ (this $\lambda$ is not the same as the $\lambda$ in the first formulation, but they are related). Here, if $\lambda < 0$, then the optimization problem encourages the $L^2$ norm of the vector $\boldsymbol{\theta}$ to get as large as possible, which goes against the point of regularization.

Thus, in both formulations of ridge regressions, choosing a negative $\lambda$ leads to undesirable results.

In the lagrangian form with negative $\lambda$, there is no minimum because the norm of $\theta$ can grow without bound. But, apparently people *have* used variants of 'negative ridge regression' (e.g. the paper referenced in [this thread](https://stats.stackexchange.com/questions/331264/understanding-negative-ridge-regression)) — user20160, Nov 17 '19 at 03:32
@user20160 interesting! I hadn't heard of negative ridge regression, thanks for the link! — Artem Mavrin, Nov 17 '19 at 03:37

Why cant Ridge Regression benift from negative lamda?

1 Answers1