1

I am training a random forest model using the sk-learn library, for a binary classification task. For some reason, when I set the max_depth parameter to 1, the model has an average 90% accuracy on predicting positive labels (sensitivity), but only around 30% when predicting negative class labels (specificity). When I increase max_depth, these two (sensitivity and specificity) begin to even out. I am unsure of the cause behind the skewed sensitivity, does anyone know of a possible explanation?

Note: My train and test data sets both have relatively even number of positive and negative examples

user123652
  • 11
  • 1
  • When you adjust the `max_depth` parameter, you actually adjust the size of the trees. With a value of 1 you grow decision stumps (trees with only 2 leaves) which will always lead to poor performance, because RF trees need to be grown as deep as possible, see http://stats.stackexchange.com/questions/169357/random-forest-overfitting-r/172476#172476 and http://stats.stackexchange.com/questions/173390/gradient-boosting-tree-vs-random-forest/174020#174020). When you increase `max_depth`, I guess that not only pos/neg balances out but also that performance increases, right? – Antoine Jul 18 '16 at 10:04

0 Answers0