Can unbalanced classes introduce bias in a Random Forest model?

Asked Jul 27 '18 at 19:36

Active Jul 27 '18 at 19:53

Viewed 468 times

I am working on a classification problem using Random Forest.

The training set has 600 instances and 16 attributes. The final class is an Yes/No answer. The ratio of "Yes" to "No" in the training set is around 5. Is it true (as I was told) that such a disparity (a lot more "Yes" than "No") can introduce a bias while predicting the class of an unknown instance using this model?

edited Jul 27 '18 at 19:53

Karolis Koncevičius

4,282
7
30
47

asked Jul 27 '18 at 19:36

Rik Ghosh

3

is https://stats.stackexchange.com/questions/227088/when-should-i-balance-classes-in-a-training-data-set helpful? – Ben Bolker Jul 27 '18 at 20:49
[This may be helpful.](https://stats.stackexchange.com/q/357466/1352) A similar simulation using RFs and your proposed scenario might be informative. – Stephan Kolassa Jul 28 '18 at 06:07

Can unbalanced classes introduce bias in a Random Forest model?

0 Answers0