3

Going through this lecture note on bias-variance trade-off, I didn't follow the latter part of this paragraph.

It shows the common situation in practice that

(1) for simple models, the bias increases very quickly, while

(2) for complex models, the variance increases very quickly. Since the riskiness is additive in the two, the optimal complexity is somewhere in the middle. Note, however, that these properties do not follow from the bias-variance decomposition, and need not even be true.

The 'It' in the above paragraph refers to the below image:

enter image description here

Questions:

1) If these are properties then why don't they follow from the bias variance decomposition, which states that: $ E[(Y-\hat{f}(x))^2] = \sigma^2 + \text{Bias}^2 + \text{Var}(\hat{f}(x))$.

2) And, under what conditions are they not true?

naive
  • 899
  • 1
  • 9
  • 14
  • 1) why should they follow from the decomposition ? It doesn't say anything about complexity. The definition you have in 1) is for a given model complexity. As far 2), if you somehow had the true underlying model. ( not estimated ), then, because there is no concept of complexity in this case ( we know the true model), then the bias won't be a function of complexity and neither would the variance , so the relation wouldn't be true in that case. – mlofton Feb 01 '19 at 15:18
  • 1
    There may be a potential to mis-read the quotation. "These properties" refers to the assertions about rates of increase in bias and variance for "simple" and "complex" models, *not* to the bias and variance. At best these properties are *heuristics*--and obviously they are separate from the bias-variance trade-off relationship. – whuber Feb 01 '19 at 20:30
  • @mloftonThe definition in 1 is for *any* model complexity. Re 2: why is there no complexity if we have the *true* model. Your second point contradicts the first. You say in first that the decomposition does not say anything about complexity and at the same time in second you implicitly assume that it does. – naive Feb 01 '19 at 20:32
  • @whuber- I must agree to your comment that these properties are heuristics. But they are nevertheless important to understand the bias variance trade-off relationships in a very intuitive way. I am just looking for something that will supplement my understanding. Cheers! – naive Feb 01 '19 at 20:38
  • 2
    While @whuber is correct that these are heuristics, another way to think about things is as a guide to what we mean by model complexity. Any reasonable definition of model complexity will cause these properties to be true. – Matthew Drury Feb 01 '19 at 20:50
  • @Matthew Drury - A reasonable definition of model complexity renders these properties true, are there conditions when they aren't true? – naive Feb 02 '19 at 03:58
  • @naive: the plot is using "complexity" to mean that the various candidate models get more complex as one goes to the right of the x axis. The properties don't explicitly follow from the definition of variance decomposition. I mean that they do follow but how one would put "complexity" in the formula ? For 2), if you have the true model, then there is no concept of complexity because the complexity on the x-axis implies that tthere are many models that one can use and, if we have the true model, then there is only one model. Maybe my 2) is not a good example-answer but 1) is clear to me. – mlofton Feb 02 '19 at 15:00
  • @mlofton - I believe the "complexity" is already incorporated in $\hat{f}(x)$ because complexity can be thought of as [a measure of number of parameters to be estimated, and how "free/independent" are they](https://stats.stackexchange.com/a/134103/168306) OR [the sensitivity of change of model estimates to perturbation of observations](https://stats.stackexchange.com/a/2830/168306). – naive Feb 04 '19 at 09:15
  • @naive: Hi. it's incorporated in the sense you described for sure but it's not part of the definition of the decomposition. There's no way, as far as I know, to make bias and variance a function of the components of complexity. – mlofton Feb 05 '19 at 16:59

1 Answers1

1

As mentioned by @whuber in comments, these properties are heuristics and they talk about the rates of increase in bias and variance and not the bias and variance with respect to the model complexity.

From Wikipedia:

In statistics and machine learning, the bias–variance tradeoff is the property of a set of predictive models whereby models with a lower bias in parameter estimation have a higher variance of the parameter estimates across samples, and vice versa. 

And on bias-variance decomposition:

The bias–variance decomposition is a way of analyzing a learning algorithm's expected generalization error with respect to a particular problem as a sum of three terms, the bias, variance, and a quantity called the irreducible error, resulting from noise in the problem itself.

So, the property that has been referred to in the question --bias variance tradeoff -- is a property of a set of predictive models, in general. As noted by @whuber the image in the question is a way of demonstrating that property heuristically.

And, the bias-variance decomposition is a way to analyze the generalization error of an algorithm for a particular problem.

naive
  • 899
  • 1
  • 9
  • 14