Can somebody give me a non-mathematical intuition why Bootstrap aggregating reduces overfitting?
From my point of view, we are not providing any additional information, we are not really enlengthen the observation number.
Can somebody give me a non-mathematical intuition why Bootstrap aggregating reduces overfitting?
From my point of view, we are not providing any additional information, we are not really enlengthen the observation number.
This is a phenomenon so called weak learners (see) in an ensemble decision yields a good performance. The reason of this explained by Dietterich here:
Uncorrelated errors made by the individual classifiers can be removed by voting.
Further explanation or theoretical justification of the statement could be an open research problem.
In order to illustrate why averaging reduce the standard deviation and make prediction more accurate i'll give an example.
Let's suppose that we have two models. The predictions are random variables $X_1 \sim N(\mu, \sigma), X_2 \sim N(\mu, \sigma)$ i.e. the prediction has a mean value plus an error term.
Considering that the errors are uncorrelated the average is:
$\frac{X_1 + X_2}{2}$ which is also normally distributed with average of $\mu$ and standard deviation of $\frac{2\sigma}{4} = \frac{\sigma}{2}$
i.e. we were able to reserve the same average while reducing the standard deviation.
Having said that, in reality the errors has some correlation and we can achieve variance reduction, but usually it has some limits and cannot be reduce towards zero by increasing the ensemble size.