Comparing two datasets with Average Precision/Precision Recall Curve

Question

When comparing performances of classifiers between two different datasets, I use the average precision metric (the datasets are very imbalanced and thus ROC or just Precision unpreferable as was discussed in this community often, i.e. here).

Now, what if the datasets I am comparing are very different in their class imbalance? We know that the baseline value for the PR AUC/Avg. Precision is the share of positive examples in the dataset. Imagine I want to compare the performance of a classifier between the "raw" dataset, and one where I used over or undersampling techniques to counteract the class imbalance.

                             Raw Dataset      Over/Undersampled Dataset
  Share of Positive Class             5%                            20%

  Average Precision Score            50%                            60%

Improvement over Baseline            45%                            40%

Is it correct to assume the classifier performs better when trained on the raw dataset? Or is this way of comparing the performance between datasets unlogical in its own? Are there other approaches that make sense?

Comparing two datasets with Average Precision/Precision Recall Curve

0 Answers0