3

If we have a classification/regression problem, when would we generally prefer to use families of models with high bias and low variance like multiple regression (logistic regression for classification)?

In other words, why would one use models with high bias and not use models with high variance and try to reduce variance?

Math_cat
  • 81
  • 5
  • 1
    My answer here (https://stats.stackexchange.com/questions/31088/are-inconsistent-estimators-ever-preferable/462086#462086) can help you. In other terms model with bias can be useful for prediction (read here: https://stats.stackexchange.com/questions/202278/endogeneity-in-forecasting/271993#271993) – markowitz May 05 '20 at 13:53
  • 1
    Linear models are not *per se* high bias models. It all depends on the modeler. By using the right interactions and non-linear terms, bias reduces and variance increases. – Michael M May 06 '20 at 08:54

2 Answers2

3

Presumably your aim is to minimise out-of-sample prediction error or estimation error in some sense.

Here is a simple non-regression example:

  • Suppose you have a normally distributed random variable with unknown mean $\mu$ and variance $\sigma^2$, and you want to estimate $\sigma^2$ from a sample size $n$.

  • You decide to use some fraction of $\sum (x_i-\bar x)^2$, which has expectation $(n-1)\sigma^2$ and variance $2(n-1)\sigma^4$.

  • If you use as your estimator $s_k^2 = \frac{1}{k}\sum (x_i-\bar x)^2$ then the bias is $\mathbb E[s_k^2-\sigma^2] = \frac{n-1-k}{k}\sigma^2$ while the variance is $\mathrm{Var}( s_k^2) = \frac{2(n-1)}{k^2} \sigma^4$ and the expected square of the error is the variance plus the square of the bias, i.e. $\mathbb E[(s_k^2-\sigma^2)^2] = \frac{{n^2-2nk+k^2 +2k -1}}{k^2}\sigma^4$

It is common to consider $k={n-1},{n},{n+1}$

  • $s_{n-1}^2 =\frac1{n-1}\sum (x_i-\bar x)^2$ is unbiased and often called the sample variance
  • $s_{n}^2 = \frac1{n}\sum (x_i-\bar x)^2$ is the maximum likelihood estimator but is biased downwards by $\frac{\sigma^2}{n}$
  • $s_{n+1}^2 = \frac1{n+1}\sum (x_i-\bar x)^2$ which minimises $\mathbb E[(s_k^2-\sigma^2)^2]$ but is biased downwards by $\frac{2\sigma^2}{n+1}$

For predictive purposes it may not be that you want to minimise the variance of an estimator (if you do, then just choose a constant such as $0$) or that you want to eliminate the bias of an estimator as ends in themselves; it may be more that you really want to minimise their combined effect on the error.

Henry
  • 30,848
  • 1
  • 63
  • 107
0

One case is when you deal with high parametric case and use penalised estimators, in you question it could be logistic regression with lasso. The shrinking decreeses variance by killing some features (possibly significant), but at the same time it reduces the bias.

Another case which comes to my mind is consistent model selection (in regression setup, though, for example, with BIC): with probability going to one we choose correct model, though for a moderate data set the selected model can be "smaller", which could give a big bias.

ABK
  • 396
  • 2
  • 17
  • Yes, logistic regression is a model with high bias. But, when would we pick up a logistic regression versus starting, for instance, with a neural network with hidden layers which has low bias but high variance. – Math_cat May 05 '20 at 13:29
  • 1
    well, in this case the answer is rather straight (do non mean to be impolite): we go for the case with high bias and lower variance if it gives us desired out-of-samle performance scores and/or interpretability. – ABK May 05 '20 at 13:31
  • Oh no, it's not impolite at all. This question just bugs me. =) That's what I thought would be the case - **interperability**. Also, still we can have same or better performance by sacrificing it then if we go for high variance. – Math_cat May 05 '20 at 13:34
  • Dear Math_cat, I have edited the answer. – ABK May 05 '20 at 13:37
  • Thank you ABK. It was a bit more philosophical question I guess in terms of selecting between different families of models: In high-parametric case, we could still opt out for reducing the variance of the high-variance model. Is there still any other intrinsic reason apart of test error and interpretability to consider low bias models? I guess the test error would be lower for high bias models with regularisation, if we have small datasets? – Math_cat May 05 '20 at 13:47
  • " I guess the test error would be lower for high bias models with regularisation, if we have small datasets?" I would say that if we have a small data set then regularisation could be a must and in this case we have to go for high bias. Bias-variance trade-off is just something that we can not, in principle, avoid. – ABK May 05 '20 at 13:54