1

I am working with linear regression methods. The weakness of the method is the possibility of overfitting. So to reduce it, some papers use regularization. Are there other methods to reduce overfitting? Can we use a prior term to reduce overfitting?

Given $D=\{(x_1,y_1);(x_2,y_2)...(x_n,y_n)\}$, the linear regression of the data $D$ is:

$$H=wX+b$$

To reduce overfitting we add some regularization term. So the loss function is:

$$J=\sum(h(x_i)-y_i)^2+\lambda_1\sum(w_i^2)$$

But finding $\lambda_1$ is so hard. Can we ignore it by using other terms to get more effective results? Thanks.

Nick Stauner
  • 11,558
  • 5
  • 47
  • 105
John
  • 181
  • 1
  • 8
  • Can you clarify your question / the thinking behind it? Note that regularization can be seen as the application of a prior centered on 0. – gung - Reinstate Monica May 21 '14 at 17:52
  • @gung: Yes. My question is that how to achieve small error if you apply linear regression for a given data set. Because the problem of linear regression is that overfitting. So we want to avoid that term by other term. How to avoid it? – John May 21 '14 at 17:55
  • @gung: Please see my edit question – John May 21 '14 at 18:05

3 Answers3

1

You can estimate an optimal lambda that minimizes testing error during cross-validation. Testing error (i.e. Mean Squared Prediction error on a hold-out testing set) should decrease as lambda increases from zero as the training data is less and less overfit, but beyond a certain point it will increase back up again as the model is inadequately capturing the data. Optimal lambda can be conservatively chosen as the one which produces a testing error that is one standard error away from the minimum testing error (on the side of the higher lambda value).

drollix
  • 181
  • 2
0

I also suggest using weights $v_i$ in loss function in case you have (or can obtain) some "measures/estimates of trust" for particular data points - maybe variances estimates. So the loss function would be the following:

$$ J = \sum_{i=1}^N{v_i (h(x_i) - y_i)^2+\lambda\sum (w^2_i)} \\ $$

Weights must be normalized, so weights must be such that $$ \sum_{i=1}^N v_i = 1 $$

The less confidence you have about data point $(x_i, y_i)$, the less $v_i$ you choose, so that you decrease the contribution of that point. The more you trust the observation point, the greater $v_i$ you choose, so that you increase the contribution of that point while fitting.

In my opinion, this approach can help reduce overfitting.

It is called locally weighted regression (LWR).

konstunn
  • 161
  • 1
  • 5
0

I would suggest also looking at other methods that are commonly used to prevent overfitting, these include splitting your dataset into training/testing/validation sets and performing cross validation. There is a ton of material on this (What is the difference between test set and validation set?) and will greatly help in generalizing your model to better fit new data.

Also, overfitting can easily occur if your features do not generalize well. For example, if you had 10 data points and fit this with a 10 dimensional line, it will give a perfect (very overfitted) model. However, new data will not fit well to this model. Therefore, cross validation and plotting bias/variance curves will help.

mike1886
  • 924
  • 7
  • 15