I have 2 versions of the same dataset, one which is fully balanced and one in which the positives:negatives is 1:2.
In both cases, when I train my SVM classifier I get low recall and quite high precision, specifically:
- balanced: 0.7 recall and 0.87 precision
- imbalanced: 0.52 recall and 0.82 precision
I would have expected to increase the recall in the imbalanced dataset, as there are now many more negatives to be misclassified than positives (see here as well). Could anyone explain this behaviour? Can I explain it as fewer positives means it's harder for the classifier to learn what a positive looks like?