For imbalanced datasets, is it necessary to use undersampling or upsampling if one evaluates the performance of ML classifier using PR curves?

Question

Previously, I would have assumed evaluating the classification performance of Decision tree and SVM with a PR curve would obviate the need for under/over-sampling since it doesn't evaluate the true negatives, but this question here Optimising for Precision-Recall curves under class imbalance implies training with under/up-sampling and then testing on imbalanced datasets (as I would be doing) could lead to varying results. I didn't gather a conclusion here as to why this occurred and am wondering if I should still upsample even if I evaluate it on an unbalanced test set.

Our Stephan Kolassa on class imbalance: https://stats.stackexchange.com/questions/357466/are-unbalanced-datasets-problematic-and-how-does-oversampling-purport-to-he — Dave, Feb 17 '21 at 18:30
@Dave This is a good resource and thanks. Keep in mind, my question is about whether under/up-sampling should even be necessary at all so my belief prior to seeing that question in the post was that I could just use a PR curve (as ostensibly suggested in Stephan's post), but that this question made me change my mind given OP's result differences when training with unbalanced vs. balanced data even when using PR curve. Any thoughts? — Prospero, Feb 17 '21 at 19:18

For imbalanced datasets, is it necessary to use undersampling or upsampling if one evaluates the performance of ML classifier using PR curves?

0 Answers0