Questions tagged [bias-variance-tradeoff]

In predictive modeling, unbiased models can have higher variance, & thus be less accurate. Modelers may prefer some bias to maximize accuracy. Use this tag also for questions about the bias-variance decomposition.

The bias-variance tradeoff is a fundamental issue in predictive modeling. Estimators / fitting algorithms for models that are unbiased (i.e., that have sampling distributions that are asymptotically centered on the true values) can have higher variance (i.e., be further from the true value in any given instance), and thus be less accurate. Modelers often prefer models that are somewhat biased so as to maximize accuracy.

210 questions
122
votes
8 answers

Bias and variance in leave-one-out vs K-fold cross validation

How do different cross-validation methods compare in terms of model variance and bias? My question is partly motivated by this thread: Optimal number of folds in $K$-fold cross-validation: is leave-one-out CV always the best choice?. The answer…
59
votes
2 answers

Optimal number of folds in $K$-fold cross-validation: is leave-one-out CV always the best choice?

Computing power considerations aside, are there any reasons to believe that increasing the number of folds in cross-validation leads to better model selection/validation (i.e. that the higher the number of folds the better)? Taking the argument to…
Amelio Vazquez-Reina
  • 17,546
  • 26
  • 74
  • 110
59
votes
7 answers

Intuitive explanation of the bias-variance tradeoff?

I am looking for an intuitive explanation of the bias-variance tradeoff, both in general and specifically in the context of linear regression.
NPE
  • 5,351
  • 5
  • 33
  • 44
50
votes
4 answers

When is a biased estimator preferable to unbiased one?

It's obvious many times why one prefers an unbiased estimator. But, are there any circumstances under which we might actually prefer a biased estimator over an unbiased one?
38
votes
9 answers

Is overfitting "better" than underfitting?

I've understood the main concepts behind overfitting and underfitting, even though some reasons as to why they occur might not be as clear to me. But what I am wondering is: isn't overfitting "better" than underfitting? If we compare how well the…
33
votes
2 answers

Understanding bias-variance tradeoff derivation

I am reading the chapter on the bias-variance tradeoff in The elements of statistical learning and I don't understand the formula on page 29. Let the data arise from a model such that $$ Y = f(x)+\varepsilon$$ where $\varepsilon$ is random number…
19
votes
2 answers

Why is best subset selection not favored in comparison to lasso?

I'm reading about best subset selection in the Elements of statistical learning book. If I have 3 predictors $x_1,x_2,x_3$, I create $2^3=8$ subsets: Subset with no predictors subset with predictor $x_1$ subset with predictor $x_2$ subset with…
Ville
  • 739
  • 8
  • 16
15
votes
2 answers

Can I (justifiably) train a second model only on the observations that a previous model predicted poorly?

Say I commit the following sins while building a predictive model: I take my dataset and split it into four subsets: Three for training (Train_A, Train_B, and Train_C) and one for validation. I train an initial model (Model_A) on Train_A. Because…
15
votes
2 answers

Question about bias-variance tradeoff

I'm trying to understand the bias-variance tradeoff, the relationship between the bias of the estimator and the bias of the model, and the relationship between the variance of the estimator and the variance of the model. I came to these…
John M
  • 1,807
  • 17
  • 32
13
votes
5 answers

Different usage of the term "Bias" in stats/machine learning

I think I've seen about 4 different usages of the word "bias" in stats/ML, and all these usages seem to be non-related. I just wanted to get clarification that the usages are indeed non-related. Here are the 4 I've seen: (1) "Bias"-variance…
11
votes
1 answer

Modern machine learning and the bias-variance trade-off

I stumbled upon the following paper Reconciling modern machine learning practice and the bias-variance trade-off and do not completely understand how they justify the double descent risk curve (see below), desribed in their paper. In the…
Samuel
  • 585
  • 4
  • 15
11
votes
4 answers

What is meant by Low Bias and High Variance of the Model?

I am new in this field of Machine Learning. From what I get by the definition, Bias: It simply represents how far your model parameters are from true parameters of the underlying population. $$ Bias(\hat{\theta}_m) = E(\hat{\theta}_m) − θ$$ where…
11
votes
1 answer

Variance term in bias-variance decomposition of linear regression

In 'The Elements of Statistical Learning', the expression for bias-variance decomposition of linear-model is given as $$Err(x_0)=\sigma_\epsilon^2+E[f(x_0)-E\hat f(x_0)]^2+||h(x_0)||^2\sigma_\epsilon^2,$$ where $f(x_0)$ is the actual target…
Abhinav Gupta
  • 1,511
  • 8
  • 23
10
votes
1 answer

Statistical Learning. Contradictions?

Currently I am re-reading some chapters of: An Introduction to Statistical Learning with Applications in R by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani (Springer, 2015). Now, I have some doubts about what is said there. Above…
markowitz
  • 3,964
  • 1
  • 13
  • 28
10
votes
1 answer

Do multiple deep descents exist?

To my knowledge, the phenomenon of double deep descent is still not well understood, but several authors have reported what they call: Model-wise double descent ("double descents" observed as models get bigger) This is framed in the abstract…
1
2 3
13 14