Total Cost Shrinkage

Question

I have a question regarding Shrinkage Methods. I am currently writing a term paper about ridge regression and lasso and before explaining the two methods, I want to give some theory on why shrinking the coefficients can help to improve prediction results.

I found a very cool coursera course on machine learning from the University of Washington, which explains the whole concept very well. In it they used one formula for the total cost of a regression, which seems to justify using shrinkage best:

Total Cost = measure of fit + measure of magnitude of coefficients

However, the formula is not derived and I dont know why it holds. I tried to google it, but I could not find this formula anywhere else than on that course. It seems to me that this is some kind of decomposition of the total cost function of a regression.

Does anyone have an idea how this formula is justified/derived?

score 1 · Answer 1 · answered Sep 03 '17 at 10:19

Lasso and ridge regression can be interpreted in Bayesian framework as Maximum A Posteriori estimation (the regularization term comes from the prior). For the details see these notes or first chapters of Statistical Learning with Sparsity.

There is also a derivation for linear regression that is based on assuming that predictor variables are noisy. Notes for Hinton's coursera course contain the details (Lecture 9c, Using noise as a regularizer).

score 0 · Answer 2 · answered Apr 24 '19 at 01:52

Shrinkage can improve prediction because of the bias-variance tradeoff: a small increase in bias (the penalty) in exchange for a larger reduction in variance (the model is less sensitive to noise or correlation of predictors).

Another way to motivate the expression you wrote is to consider the connection between setting a maximum size on some norm of the coefficients and an equivalent unconstrained optimization problem. See Showing the Equivalence Between the $ {L}_{2} $ Norm Regularized Regression and $ {L}_{2} $ Norm Constrained Regression Using KKT

Some related questions:

Total Cost Shrinkage

2 Answers2