5

Cross-posted from my identical question on math.stackexchange:

Given a metrix $X$ and a vector $\vec{y}$, ordinary least squares (OLS) regression tries to find $\vec{c}$ such that $\left\| X \vec{c} - \vec{y} \right\|_2^2$ is minimal. (If we assume that $\left\| \vec{v}\right\|_2^2=\vec{v} \cdot \vec{v}$.)

Ridge regression tries to find $\vec{c}$ such that $\left\| X \vec{c} - \vec{y} \right\|_2^2 + \left\| \Gamma \vec{c} \right\|_2^2 $ is minimal.

However, I have an application where I need to minimize not the sum of squared errors, but the square root of this sum. Naturally, the square root is an increasing function, so this minimum will be at the same location, so the OLS regression will still give the same result. But will ridge regression?

On the one hand, I don't see how minimizing $\left\| X \vec{c} - \vec{y} \right\|_2^2 + \left\| \Gamma \vec{c} \right\|_2^2 $ will necessarily result in the same $\vec{c}$ as minimizing $\sqrt{ \left\| X \vec{c} - \vec{y} \right\|_2^2 } + \left\| \Gamma \vec{c} \right\|_2^2 $.

On the other hand, I've read (though never seen shown) that minimizing $\left\| X \vec{c} - \vec{y} \right\|_2^2 + \left\| \Gamma \vec{c} \right\|_2^2 $ (ridge regression) is identical to minimizing $\left\| X \vec{c} - \vec{y} \right\|_2^2$ under the constraint that $ \left\|\Gamma \vec{c}\right\|_2^2 < t$, where $t$ is some parameter. And if this is the case, then it should result in the same solution as minimizing $\sqrt{ \left\| X \vec{c} - \vec{y} \right\|_2^2}$ under the same constraint.

Ferdi
  • 4,882
  • 7
  • 42
  • 62
acdr
  • 215
  • 1
  • 7
  • @RichardHardy But how can that be? Surely, the minimum of $f(c)$ will happen at the same values of $c$ as the minimum of $\sqrt{f(c)}$, considering $\sqrt{x}$ is an increasing function. (Assuming $f(c)$ is positive so the square root is real. Otherwise, "minimum" may not be well-defined.) – acdr Aug 27 '18 at 11:43
  • Sorry, got the last point wrong. The corresponding $t$s will not be related as $t_0$ and $\sqrt{t_0}$, the relationship will be different. Deleted my comment. – Richard Hardy Aug 27 '18 at 12:23
  • 1
    Please note that our [help/on-topic] says "Please note, however, that cross-posting is not encouraged on SE sites. Choose one best location to post your question. Later, if it proves better suited on another site, it can be migrated." – Glen_b Aug 28 '18 at 11:56

1 Answers1

4

minimizing

$$ \left\| X \vec{c} - \vec{y} \right\|_2^2 + \left\| \Gamma \vec{c} \right\|_2^2 $$

and minimizing

$$ \sqrt{\left\| X \vec{c} - \vec{y} \right\|_2^2} + \left\| \Gamma \vec{c} \right\|_2^2 $$

do not directly relate to minimizing ${\left\| X \vec{c} - \vec{y} \right\|_2^2}$ or $\sqrt{\left\| X \vec{c} - \vec{y} \right\|_2^2}$ under the constraint $\left\|\vec{c}\right\|_2^2 < t$.

There will need to be a conversion between $t$ and $\Gamma$ which will be different for the two different cost functions. Thus the minimization of MSE and RMSE with a same penalty term defined by $\Gamma$ will relate to a constrained minimization with different constraints $t$.

Note that for every solution $\vec{c}$ to minimizing the MSE with penalty term $\Gamma_1$ there will be a penalty term $\Gamma_2$ that results in the same solution $\vec{c}$ when minimizing the penalized RMSE. So for many practical purposes you can use any methods/software that solves the penalized MSE problem, but only you need to use a different cost function when, for instance, performing cross validation to select the ideal $\Gamma$.

Sextus Empiricus
  • 43,080
  • 1
  • 72
  • 161
  • What do you mean by the "do not directly relate" part? Am I to understand that ridge regression (penalizing $\left\| \Gamma \vec{c} \right\|_2^2 $) is not equivalent to solving the OLS regression under the constraint that $\left\|\vec{c}\right\|_2^2 < t$, for *some* value of $t$? That seems to disagree with what I'm reading elsewhere (e.g. https://math.stackexchange.com/questions/335306/why-are-additional-constraint-and-penalty-term-equivalent-in-ridge-regression). Though I'll happily admit that I don't actually follow the derivations, so I may be misunderstanding their implications. – acdr Aug 27 '18 at 13:24
  • @acdr I meant that you will not *directly* have $t=\Gamma$, but instead you need some function (which I believe does not exist in closed form) to *indirectly* relate $\Gamma$ and $t$. This function will be different for the two cases. This is how the solutions of the two cases can be the same for the same $t$ in the restricted formulation but different for the same $\Gamma$ in the penalized formulation. – Sextus Empiricus Aug 27 '18 at 13:53
  • 2
    Ah, I think I get it. Minimizing the regularized MSE is equivalent to minimizing the UNregularized MSE under some constraint. But minimizing the regularized RMSE is equivalent to minimizing the unregularized MSE *under a different constraint*, so the solution will be different. – acdr Aug 28 '18 at 07:02
  • Yes like that. If your starting point is the same regularization parameter $\Gamma$ than you will end up with different constraint parameters $t$ for RMSE vs MSE. Or in reverse. If you start with the same constraint parameter $t$ (for which the RMSE and MSE solutions are equal) then you will end up with different regularization parameters $\Gamma$ (for which the RMSE and MSE will also be equal). – Sextus Empiricus Aug 28 '18 at 07:36