Low recall when positive is the minority class?

Question

I have 2 versions of the same dataset, one which is fully balanced and one in which the positives:negatives is 1:2.

In both cases, when I train my SVM classifier I get low recall and quite high precision, specifically:

balanced: 0.7 recall and 0.87 precision
imbalanced: 0.52 recall and 0.82 precision

I would have expected to increase the recall in the imbalanced dataset, as there are now many more negatives to be misclassified than positives (see here as well). Could anyone explain this behaviour? Can I explain it as fewer positives means it's harder for the classifier to learn what a positive looks like?

Are these P:R values evaluated on an independent test? If yes, how large? If not, are these created through resampling? Please note that: 1. Training set performance is (mostly) meaningless. 2. Rebalancing the training data is at times frowned up but rebalancing the test set is outright wrong. So what are we evaluating against? — usεr11852, May 22 '20 at 16:26
This is performance on leave-one-out CV - I do not have a validation set at the moment, so I use nested CV to find the optimal model parameters and then I reapply to the entire set (similar to here: https://stats.stackexchange.com/questions/11602/training-on-the-full-dataset-after-cross-validation?noredirect=1&lq=1). The total set has 269 observations when balanced and 208 when unbalanced, which is why I did not want to use hold-out. The observations I removed is because I was worried they might be corrupting my model by being "too good" because of their origin. — mm523, May 22 '20 at 16:41

Low recall when positive is the minority class?

0 Answers0