I know overfitting and underfitting in machine learning context, and what generalisation means as well. But, recently I was introduced to an uncommon terminology "overgeneralization" in context of fitting. What should this term relate to? Underfitting? Overfitting? Something else completely?
1 Answers
From what I understand, over-generalization would be inferring that your regression estimator or classifier has a smaller generalization error when applied to the population of interest than it really does.
Here's an example. Say we come up with a model that allows us to use metabolic markers to classify strains of E. coli. We use cross-validation of our available lab data to make sure that we are not under-fitting or over-fitting our model. This cross-validation gives us an estimate of the generalization error. Perhaps, after some refinement, we arrive at a model that has a sufficiently small learning error (to avoid under-fitting) and a sufficiently small test error (to avoid over-fitting). We may at this point be thinking that our model has a lot of potential with a large generalizability. However, if we were to test our model on some wild, non-lab samples of bacteria, we may be disappointed in the results. We likely overgeneralized the applicability of our model due to the fact that our training data was not representative of the population of interest. Inaccurate estimation of the generalization error is likely to occur if our population is heterogeneous and our sample size is small.
Reference: Izenman, A. J. (2008). Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning. Germany: Springer New York.

- 94
- 4