Equating the two equations of Ridge Regression

Question

I'm studying Ridge Regression now and I'm having a bit of trouble understanding how to relate the two equations that pop up when I read about it. There is the coefficient estimate: $$\hat{\beta} = (X^TX + \lambda I)^{-1}X^TY$$ and then there is simply writing it out as Linear Regression such that $\beta^T\beta < t$.

My specific question is how do you relate $\lambda$ and $t$? As far as I know the equation with lamda is used for actual computation, while simply imposing a restriction on linear regression helps with conceptualizing the method, and relates to the graph given in page 271 of this journal article: http://statweb.stanford.edu/~tibs/lasso/lasso.pdf.

Any help/explanations would be very helpful, thanks!

score 1 · Answer 1 · answered Mar 24 '15 at 14:29

1

The two equations are the same (I delete the $1/2$ though - it is not in Tibshirani either):

Write out the second expression:

$$Y^TY-2\beta^TX^TY+\beta^TX^TX\beta+\lambda\beta^T\beta$$

Finding the $\beta$ that minimizes this expression gives us the $\text{argmin}$, so let us set the first derivative to zero:

$$-2X^TY+2X^TX\beta+2\lambda\beta\stackrel{!}{=}0$$

Solving for $\beta$ yields the ridge estimator.

As for your interpretation, I would agree, yes.

answered Mar 24 '15 at 14:29

Christoph Hanck

25,948
3
57
106

So it appears I'm a little more tired than I thought. I edited my question, but what I really meant was BOTH those equations and simply writing linear regression such that $\beta^T\beta > t$ – the_deuce Mar 24 '15 at 14:40
I see. You also mean $\beta^T\beta – Christoph Hanck Mar 24 '15 at 14:47
Yes that's correct, I'm a mess! Thanks for the reply, do you have a suggestion of a source I can read that talks about the 1-1 relationship? Most articles I have found are pretty advance and gloss it over, and most don't even mention the $t$ restriction. If not I can keep searching, I'm sure I'll find something. – the_deuce Mar 24 '15 at 15:06
The books by Tibshirani (and others) himself are my favorites, see [here](http://www-bcf.usc.edu/~gareth/ISL/) and [here](http://statweb.stanford.edu/~tibs/ElemStatLearn/). – Christoph Hanck Mar 24 '15 at 15:35

score 1 · Answer 2 · edited Apr 13 '17 at 12:44

The first equation is the solution of an unconstrained optimisation problem that you get from constructing the Lagrangian of the constrained optimisation problem. See my answer to this related question. There isn't usually a simple relationship between $C$ and $t$, but there is a mapping between the two. It is possible to fir the model using either the unconstrained problem or the constrained one, but usually unconstrained optimisation problems are easier to solve ones with constraints, so the "normal" equations are the way they are normally implemented.

score 0 · Answer 3 · answered Mar 24 '15 at 17:02

I THINK its as a result of the Karush-Kuhn-Tucker conditions Karush-Kuhn-Tucker conditions which generalise the lagrange multiplier case. and yes there is no explicit relation between lambda and t.

see in particular the Value function section ( which I have modified hopefully correctly):

Given this definition, the coefficient, $\lambda$, is the rate at which the value function increases as $t$ increases.

Equating the two equations of Ridge Regression

3 Answers3