Questions tagged [overfitting]

Modeling error (especially sampling error) instead of replicable and informative relationships among variables improves model fit statistics, but reduces parsimony, and worsens explanatory and predictive validity.

Models that involve complex polynomial functions or too many independent variables may fit particular samples' covariance structures overly well, such that some existing (and any potential, additional) terms increase model fit by modeling sampling error, not systematic covariance that is likely to replicate or represent theoretically useful relationships. When used to predict other data (e.g., future outcomes, out-of-sample data), overfitting increases prediction error.

The Wikipedia page offers illustrations, lists of potential solutions, and special treatment of the topic as it relates to machine learning. See also:

Leinweber, D. J. (2007). Stupid data miner tricks: Overfitting the S&P 500. The Journal of Investing, 16(1), 15–22. Available online, URL: http://www.finanzaonline.com/forum/attachments/econometria-e-modelli-di-trading-operativo/903701d1213616349-variazione-della-vix-e-rendimento-dello-s-p500-dataminejune_2000.pdf. Accessed January 6, 2014.

Tetko, I. V., Livingstone, D. J., & Luik, A. I. (1995). Neural network studies. 1. Comparison of overfitting and overtraining. J. Chem. Inf. Comput. Sci. 35(5), 826–833. doi:10.1021/ci00027a006.

878 questions
115
votes
21 answers

What's a real-world example of "overfitting"?

I kind of understand what "overfitting" means, but I need help as to how to come up with a real-world example that applies to overfitting.
user3851283
  • 307
  • 2
  • 4
  • 3
101
votes
6 answers

How is it possible that validation loss is increasing while validation accuracy is increasing as well

I am training a simple neural network on the CIFAR10 dataset. After some time, validation loss started to increase, whereas validation accuracy is also increasing. The test loss and test accuracy continue to improve. How is this possible? It seems…
66
votes
4 answers

Random Forest - How to handle overfitting

I have a computer science background but am trying to teach myself data science by solving problems on the internet. I have been working on this problem for the last couple of weeks (approx 900 rows and 10 features). I was initially using logistic…
Abhi
  • 1,269
  • 3
  • 13
  • 17
64
votes
6 answers

Is ridge regression useless in high dimensions ($n \ll p$)? How can OLS fail to overfit?

Consider a good old regression problem with $p$ predictors and sample size $n$. The usual wisdom is that OLS estimator will overfit and will generally be outperformed by the ridge regression estimator: $$\hat\beta = (X^\top X + \lambda I)^{-1}X^\top…
amoeba
  • 93,463
  • 28
  • 275
  • 317
56
votes
4 answers

What should I do when my neural network doesn't generalize well?

I'm training a neural network and the training loss decreases, but the validation loss doesn't, or it decreases much less than what I would expect, based on references or experiments with very similar architectures and data. How can I fix this? As…
DeltaIV
  • 15,894
  • 4
  • 62
  • 104
40
votes
6 answers

How does cross-validation overcome the overfitting problem?

Why does a cross-validation procedure overcome the problem of overfitting a model?
user3269
  • 4,622
  • 8
  • 43
  • 53
38
votes
9 answers

Is overfitting "better" than underfitting?

I've understood the main concepts behind overfitting and underfitting, even though some reasons as to why they occur might not be as clear to me. But what I am wondering is: isn't overfitting "better" than underfitting? If we compare how well the…
37
votes
2 answers

Dealing with singular fit in mixed models

Let's say we have a model mod <- Y ~ X*Condition + (X*Condition|subject) # Y = logit variable # X = continuous variable # Condition = values A and B, dummy coded; the design is repeated # so all participants go through both…
User33268
  • 1,408
  • 2
  • 10
  • 21
35
votes
5 answers

Overfitting a logistic regression model

Is it possible to overfit a logistic regression model? I saw a video saying that if my area under the ROC curve is higher than 95%, then its very likely to be over fitted, but is it possible to overfit a logistic regression model?
carlosedubarreto
  • 547
  • 2
  • 5
  • 10
33
votes
5 answers

Is an overfitted model necessarily useless?

Assume that a model has 100% accuracy on the training data, but 70% accuracy on the test data. Is the following argument true about this model? It is obvious that this is an overfitted model. The test accuracy can be enhanced by reducing the…
Hossein
  • 3,170
  • 1
  • 16
  • 32
32
votes
2 answers

Does it make sense to combine PCA and LDA?

Assume I have a dataset for a supervised statistical classification task, e.g., via a Bayes' classifier. This dataset consists of 20 features and I want to boil it down to 2 features via dimensionality reduction techniques such as Principal…
user39663
31
votes
9 answers

How can we explain the "bad reputation" of higher-order polynomials?

We all must have heard it by now - when we start learning about statistical models overfitting data, the first example we are often given is about "polynomial functions" (e.g., see the picture here): We are warned that although higher-degree…
stats_noob
  • 5,882
  • 1
  • 21
  • 42
31
votes
2 answers

Can overfitting and underfitting occur simultaneously?

I am trying to understand overfitting and underfitting better. Consider a data generating process (DGP) $$ Y=f(X)+\varepsilon $$ where $f(\cdot)$ is a deterministic function, $X$ are some regressors and $\varepsilon$ is a random error term…
Richard Hardy
  • 54,375
  • 10
  • 95
  • 219
31
votes
5 answers

Why do smaller weights result in simpler models in regularization?

I completed Andrew Ng's Machine Learning course around a year ago, and am now writing my High School Math exploration on the workings of Logistic Regression and techniques to optimize on performance. One of these techniques is, of course,…
30
votes
0 answers

When wouldn't I use LASSO for model selection?

Assume that you need to build a linear model to make predictions for new observations, and that there is uncertainty about which subset of variables should be included in the model. You are only interested in making predictions, there is no theory…
D L Dahly
  • 3,663
  • 1
  • 24
  • 51
1
2 3
58 59