Previously, I would have assumed evaluating the classification performance of Decision tree and SVM with a PR curve would obviate the need for under/over-sampling since it doesn't evaluate the true negatives, but this question here Optimising for Precision-Recall curves under class imbalance implies training with under/up-sampling and then testing on imbalanced datasets (as I would be doing) could lead to varying results. I didn't gather a conclusion here as to why this occurred and am wondering if I should still upsample even if I evaluate it on an unbalanced test set.
Asked
Active
Viewed 40 times
0
-
1Our Stephan Kolassa on class imbalance: https://stats.stackexchange.com/questions/357466/are-unbalanced-datasets-problematic-and-how-does-oversampling-purport-to-he – Dave Feb 17 '21 at 18:30
-
@Dave This is a good resource and thanks. Keep in mind, my question is about whether under/up-sampling should even be necessary at all so my belief prior to seeing that question in the post was that I could just use a PR curve (as ostensibly suggested in Stephan's post), but that this question made me change my mind given OP's result differences when training with unbalanced vs. balanced data even when using PR curve. Any thoughts? – Prospero Feb 17 '21 at 19:18