I want to know the relationship between statistical model(regression model) over-fitting and number of parameters to be estimated. This could be a fundamental problem but I appreciate if someone can explain.
1 Answers
An exact answer depends on a particular statistical model and the data dimensionality. However, usually the more parameters the model has, the more functions it can represent. A common assumption is that the function generating the data is simpler than the exact random noise on the training samples, thus smaller models (in the number of parameters) will be able to model the desired function but not the noise. Fitting the noise on the training samples is overfitting.
I will give two specific examples:
1. Linear regression with polynomial model
The model is:
$y= a_0 + a_1 x + a_2x^2+\ldots +a_nx^n$
This model is able to fit exactly any consistent dataset of $n$ training samples. Consistent means there are no two samples with the same $x$ but different $y$.
This means that if the number of parameters is greater or equal to the number of training samples, you are guaranteed to overfit. However, if the data was generated by a polynomial of degree $m, m<n$, certain level of overfitting will also happen whenever the model polynomial has degree in the range $k, m<k<n$, even though it will not be able to fit the data perfectly.
2. Neural networks
Zhang et al. (2017) in their paper Understanding deep learning requires generalization show that a simple two-layer neural network with $2n+d$ parameters is capable of perfectly fitting any dataset of $n$ samples of dimension $d$.
However, note that while commonly used neural networks have much more than $2n+d$ parameters, they do not necessarily overfit: The minimum of the loss function (which corresponds to modeling the noise on the training data) cannot be found in practice, and there are many well-studied regularization methods (early stopping, to give an example) that prevent overfitting. Moreover, the mentioned paper also includes an interesting discussion about yet not understood properties of deep neural networks, which prevent overfitting.

- 10,121
- 1
- 36
- 62
-
In fact, there are 30 parameters to be estimated in my model and I have only 30 data points. According to your explanation, over fitting could happen due to lack of data points. I will try to generate more data points with simulation and, try to fit the model. Thank you so much for your valuable explanation. – Lank Dec 27 '17 at 00:48
-
In your scenario I would do two more things: 1) make sure your model is not too complex for the problem. 2) try incorporating your prior knowledge into the model (in the form of a regularizer): often you know more about the problem then just having a handful of data points. Instead of forcing the model to extract all the knowledge only from the data, you can, for example, penalize unlikely parameter combinations. – Jan Kukacka Dec 27 '17 at 10:28
-
unfortunately I cannot upvote because still I do not have the privilege to vote. I am sorry, I would surely vote for you if I have the privilege – Lank Dec 28 '17 at 05:37