1

Which are the possible caveats of using a Bagging algorithm (such as Random Forest), when data are not independent?

Ensemble models usually exploit Bagging to reduce the variance by aggregating several models built on bootstrap samples of the original dataset. However, observations may be dependent, as it happens with time-series or longitudinal data.

On a theoretical point of view, is this a problem?

Thank you in advance!

  • Note: I have seen several questions related to this topic, e.g.: (a) https://stats.stackexchange.com/questions/52016/non-independence-of-ivs-in-a-random-forest-model (b) https://stats.stackexchange.com/questions/245104/random-forest-with-longitudinal-data (c) https://stats.stackexchange.com/questions/123917/using-random-forest-for-survival-analysis-with-time-varying-covariates Nevertheless, no one really answered the question of how Bagging is affected by dependent observations. – Niccolò Ajroldi Mar 03 '22 at 14:21

0 Answers0