3

I am working on a classification problem using Random Forest.

The training set has 600 instances and 16 attributes. The final class is an Yes/No answer. The ratio of "Yes" to "No" in the training set is around 5. Is it true (as I was told) that such a disparity (a lot more "Yes" than "No") can introduce a bias while predicting the class of an unknown instance using this model?

Karolis Koncevičius
  • 4,282
  • 7
  • 30
  • 47
Rik Ghosh
  • 31
  • 2
  • 3
    is https://stats.stackexchange.com/questions/227088/when-should-i-balance-classes-in-a-training-data-set helpful? – Ben Bolker Jul 27 '18 at 20:49
  • [This may be helpful.](https://stats.stackexchange.com/q/357466/1352) A similar simulation using RFs and your proposed scenario might be informative. – Stephan Kolassa Jul 28 '18 at 06:07

0 Answers0