Bagging dependent data

Asked Feb 23 '22 at 12:44

Active Mar 03 '22 at 14:28

Viewed 15 times

Which are the possible caveats of using a Bagging algorithm (such as Random Forest), when data are not independent?

Ensemble models usually exploit Bagging to reduce the variance by aggregating several models built on bootstrap samples of the original dataset. However, observations may be dependent, as it happens with time-series or longitudinal data.

On a theoretical point of view, is this a problem?

Thank you in advance!

edited Mar 03 '22 at 14:28

asked Feb 23 '22 at 12:44

Niccolò Ajroldi

Note: I have seen several questions related to this topic, e.g.: (a) https://stats.stackexchange.com/questions/52016/non-independence-of-ivs-in-a-random-forest-model (b) https://stats.stackexchange.com/questions/245104/random-forest-with-longitudinal-data (c) https://stats.stackexchange.com/questions/123917/using-random-forest-for-survival-analysis-with-time-varying-covariates Nevertheless, no one really answered the question of how Bagging is affected by dependent observations. – Niccolò Ajroldi Mar 03 '22 at 14:21

Bagging dependent data

0 Answers0