Linear Regression: How to favour less "sensitive" parameters?

Question

I have a simple regression model (y = param1*x1 + param2*x2). When I fit the model to my data, I find two good solutions:

Solution A, params=(2,7), is best on the training set with RMSE=2.5
BUT! Solution B params=(24,20) wins big on the validation set, when I do cross validation.

I suspect this is because:

solution A is sorrounded by bad solutions. So when I use solution A, the model is more sensitive to data variations.
solution B is sorrounded by OK solutions, so it's less sensitive to changes in the data.

Is this a brand new theory I've just invented, that solutions with good neighbours are less overfitting? :))

Are there generic optimisation methods that would help me favour solutions B, to solution A?

HELP!

Your image is training set error? Can you make the same image for cross-validation error? Thumbs up for a cool plot. — Zach, Sep 14 '17 at 20:52
Could you also share the data? This is an interesting problem. — Zach, Sep 14 '17 at 20:53
Purely as a statistical matter, if I have a sufficiently large dataset distributed according to a linear model, and I look at small subsets of the dataset, there will be a subset with _any_ value of the slope that I could desire. So, you would need to test against the null hypothesis that this simply arises by chance. If you have some additional reason to think that the validation set is more reliable than the training set, you can use weighted least squares regression to adjust the importance of the validation set vs the training set. — Dave Kielpinski, Sep 14 '17 at 23:41

score 2 · Answer 1 · answered Nov 16 '17 at 05:33

The only way to obtain an rmse that has two local minima is for the residuals of model and data to be nonlinear. Since one of these, the model, is linear (in 2D), the other, i.e., the $y$ data, must be nonlinear either with respect to the underlying tendency of the data or the noise function of that data, or both.

Therefore, a better model, a nonlinear one, would be the starting point for investigating the data. Moreover, without knowing something more about the data, one cannot say what regression method should be used with any certainty. I can offer that Tikhonov regularization, or related ridge regression, would be a good way to address the OP question. However, what smoothing factor should be used would depend on what one is trying to obtain by modelling. The assumption here appears to be that the least rmse makes the best model as we do not have a regression goal (other than OLS which is THE "go to" default method most often used to when a physically defined regression target is not even conceptualized).

So, what is the purpose of performing this regression, please? Without defining that purpose, there is no regression goal or target and we are just finding a regression for cosmetic purposes.

Linear Regression: How to favour less "sensitive" parameters?

1 Answers1