1

I want to optimize random forest hyperparameters. I am using random search. I have specified a range of 100 to 1000 as the number of trees. How can I know whether the given # of trees is overfitting the data? I mean lets say I am given 450 trees as an ideal value. But the truth is that with 156 is just enough.

Thank you.

Aizzaac
  • 989
  • 2
  • 11
  • 21
  • 1
    Other parameters of random forests (e.g. tree depth) can lead to overfitting, but number of trees shouldn't. Increasing the number of trees can increase performance, although you'll see diminishing returns at some point. To test this for yourself, you could plot validation error (using cross validation or a held-out portion of the data) vs. number of trees. It shouldn't increase. These threads might be useful: http://stats.stackexchange.com/questions/111968/random-forest-how-to-handle-overfitting and http://datascience.stackexchange.com/questions/6380/how-to-avoid-overfitting-in-random-forest – user20160 Jan 20 '17 at 05:31
  • Adding additional trees will improve the model until reaching a plateau. https://stats.stackexchange.com/questions/348245/do-we-have-to-tune-the-number-of-trees-in-a-random-forest – Sycorax May 28 '18 at 03:47

0 Answers0