Should I balance data set for survival random forest? By subsampling I will loose information in data set. However I would do that in RF for classification. Should it be done also in case of survival analysis? I am not sure whether there is a conceptual difference.
Asked
Active
Viewed 71 times
1
-
What do you mean by survival random forest? – Itamar Mushkin Jul 27 '20 at 12:48
-
a random forest with survival object as response variable. It is trained with package randomForestSRC in R. – pikachu Jul 27 '20 at 12:55
-
1Don't balance, in neither case. [Are unbalanced datasets problematic, and (how) does oversampling (purport to) help?](https://stats.stackexchange.com/q/357466/1352) – Stephan Kolassa Jul 27 '20 at 13:20
1 Answers
3
Don't balance, in neither case. Are unbalanced datasets problematic, and (how) does oversampling (purport to) help?
(Converted from a comment. For my rationale, see here. On short answers, see here. Better and longer answers are always welcome.)

Stephan Kolassa
- 95,027
- 13
- 197
- 357