2

I have a question regarding Shrinkage Methods. I am currently writing a term paper about ridge regression and lasso and before explaining the two methods, I want to give some theory on why shrinking the coefficients can help to improve prediction results.

I found a very cool coursera course on machine learning from the University of Washington, which explains the whole concept very well. In it they used one formula for the total cost of a regression, which seems to justify using shrinkage best:

Total Cost = measure of fit + measure of magnitude of coefficients 

However, the formula is not derived and I dont know why it holds. I tried to google it, but I could not find this formula anywhere else than on that course. It seems to me that this is some kind of decomposition of the total cost function of a regression.

Does anyone have an idea how this formula is justified/derived?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Benkyozamurai
  • 473
  • 2
  • 9

2 Answers2

1

Lasso and ridge regression can be interpreted in Bayesian framework as Maximum A Posteriori estimation (the regularization term comes from the prior). For the details see these notes or first chapters of Statistical Learning with Sparsity.

There is also a derivation for linear regression that is based on assuming that predictor variables are noisy. Notes for Hinton's coursera course contain the details (Lecture 9c, Using noise as a regularizer).

Jakub Bartczuk
  • 5,526
  • 1
  • 14
  • 36
0

Shrinkage can improve prediction because of the bias-variance tradeoff: a small increase in bias (the penalty) in exchange for a larger reduction in variance (the model is less sensitive to noise or correlation of predictors).

Another way to motivate the expression you wrote is to consider the connection between setting a maximum size on some norm of the coefficients and an equivalent unconstrained optimization problem. See Showing the Equivalence Between the $ {L}_{2} $ Norm Regularized Regression and $ {L}_{2} $ Norm Constrained Regression Using KKT

Some related questions:

Sycorax
  • 76,417
  • 20
  • 189
  • 313