What is the basic argument upon which ridge and lasso regression are based on? I went through Tikhonov regularization wiki where it was mentioned that
In many cases, tikhonov matrix is chosen as the identity matrix , giving preference to solutions with smaller norms. In other cases, lowpass operators (e.g., a difference operator or a weighted Fourier operator) may be used to enforce smoothness if the underlying vector is believed to be mostly continuous.
I want to understand that why are solutions with smaller norms more appealing? Smoothness I can get but why smaller norms?