0

I'm studying Ridge Regression now and I'm having a bit of trouble understanding how to relate the two equations that pop up when I read about it. There is the coefficient estimate: $$\hat{\beta} = (X^TX + \lambda I)^{-1}X^TY$$ and then there is simply writing it out as Linear Regression such that $\beta^T\beta < t$.

My specific question is how do you relate $\lambda$ and $t$? As far as I know the equation with lamda is used for actual computation, while simply imposing a restriction on linear regression helps with conceptualizing the method, and relates to the graph given in page 271 of this journal article: http://statweb.stanford.edu/~tibs/lasso/lasso.pdf.

Any help/explanations would be very helpful, thanks!

the_deuce
  • 65
  • 1
  • 6

3 Answers3

1

The two equations are the same (I delete the $1/2$ though - it is not in Tibshirani either):

Write out the second expression:

$$Y^TY-2\beta^TX^TY+\beta^TX^TX\beta+\lambda\beta^T\beta$$

Finding the $\beta$ that minimizes this expression gives us the $\text{argmin}$, so let us set the first derivative to zero:

$$-2X^TY+2X^TX\beta+2\lambda\beta\stackrel{!}{=}0$$

Solving for $\beta$ yields the ridge estimator.

As for your interpretation, I would agree, yes.

Christoph Hanck
  • 25,948
  • 3
  • 57
  • 106
  • So it appears I'm a little more tired than I thought. I edited my question, but what I really meant was BOTH those equations and simply writing linear regression such that $\beta^T\beta > t$ – the_deuce Mar 24 '15 at 14:40
  • I see. You also mean $\beta^T\beta – Christoph Hanck Mar 24 '15 at 14:47
  • Yes that's correct, I'm a mess! Thanks for the reply, do you have a suggestion of a source I can read that talks about the 1-1 relationship? Most articles I have found are pretty advance and gloss it over, and most don't even mention the $t$ restriction. If not I can keep searching, I'm sure I'll find something. – the_deuce Mar 24 '15 at 15:06
  • The books by Tibshirani (and others) himself are my favorites, see [here](http://www-bcf.usc.edu/~gareth/ISL/) and [here](http://statweb.stanford.edu/~tibs/ElemStatLearn/). – Christoph Hanck Mar 24 '15 at 15:35
1

The first equation is the solution of an unconstrained optimisation problem that you get from constructing the Lagrangian of the constrained optimisation problem. See my answer to this related question. There isn't usually a simple relationship between $C$ and $t$, but there is a mapping between the two. It is possible to fir the model using either the unconstrained problem or the constrained one, but usually unconstrained optimisation problems are easier to solve ones with constraints, so the "normal" equations are the way they are normally implemented.

Dikran Marsupial
  • 46,962
  • 5
  • 121
  • 178
0

I THINK its as a result of the Karush-Kuhn-Tucker conditions Karush-Kuhn-Tucker conditions which generalise the lagrange multiplier case. and yes there is no explicit relation between lambda and t.

see in particular the Value function section ( which I have modified hopefully correctly):

Given this definition, the coefficient, $\lambda$, is the rate at which the value function increases as $t$ increases.

seanv507
  • 4,305
  • 16
  • 25