Why Random Forest predominantly predicts class 0 with balanced classes?

Asked Mar 02 '19 at 19:45

Active Mar 02 '19 at 19:45

Viewed 146 times

Similar question has been asked here: Binary Classification: good at predicting negative class but bad at predicting positive class but no answer, hence posting my question again.

I have a binary classification problem with 100+ features. The classes are fairly balanced (close to 50-50 split). I don't understand why the model is predicting class 0 well with a recall of over 90% but class 1 extremely poorly with a recall of only 17%.

I have solved this problem often by weighing, oversampling & under-sampling and changing the cut off point when the classes are imbalanced. Here, classes are fairly balanced, yet such a strange pattern.

Has anyone solved this before? Any idea what is going on? What techniques could I use to improve the prediction for both the classes?

asked Mar 02 '19 at 19:45

direbutterfly

2

If instead of asking the forests for class assignments (i.e. majority voting), you ask for probability estimates, what do you get? What is the AUC of your model? How do these metrics change as you vary the classification threshold? – Matthew Drury Mar 02 '19 at 19:57
Matthew Drury (+1) is alluding to the themes developed in this thread https://stats.stackexchange.com/questions/312780/why-is-accuracy-not-the-best-measure-for-assessing-classification-models – Sycorax Mar 02 '19 at 20:04
Thanks Matthew, the best AUC I get to is 0.6. If I change the threshold, the recall of 0 gets worse, that of 1 gets better - overall in the range of 50-60% for both. – direbutterfly Mar 02 '19 at 20:07
Thanks Sycorax, I will go through the reference thread. – direbutterfly Mar 02 '19 at 20:07
What is recall for class 0 and recall for class 1? – Laksan Nathan Mar 02 '19 at 20:08

Why Random Forest predominantly predicts class 0 with balanced classes?

0 Answers0