I have a question regarding Shrinkage Methods. I am currently writing a term paper about ridge regression and lasso and before explaining the two methods, I want to give some theory on why shrinking the coefficients can help to improve prediction results.
I found a very cool coursera course on machine learning from the University of Washington, which explains the whole concept very well. In it they used one formula for the total cost of a regression, which seems to justify using shrinkage best:
Total Cost = measure of fit + measure of magnitude of coefficients
However, the formula is not derived and I dont know why it holds. I tried to google it, but I could not find this formula anywhere else than on that course. It seems to me that this is some kind of decomposition of the total cost function of a regression.
Does anyone have an idea how this formula is justified/derived?