1

There are a lot of questions here regarding when to do class balancing, or what to expect of class balancing or whether unbalanced classes are an issue at all.

Apparently the "consensus" among most of the top answers on these questions is that, for binary classification models:

  1. Balancing techniques are essentially an "old habit", back when estimating models was much more computationally expensive.

  2. If a model is well specified, then class imbalance shouldn't be a problem at all.

  3. Class balancing techniques introduce bias and overall make models perform worse.

  4. Most of class balancing discussion is assuming that accuracy is the KPI to optimize and a decision threshold of 0.5 (considering that classification models output a probability from 0 to 1)

Now, I've seen plenty of examples online (Kaggle, medium posts, etc..) implementing SMOTE, undersampling, class weights and such. Most results I've seen improve recall at the cost of degrading precision for a specific threshold, but don't really change or improve ROC AUC.

However, I've also seen some examples (even though) simulated where such techiniques actually improve ROC AUC (and even PR AUC) dramatically, such as this one.

Why is that so? If it is a consensus that inbalanced datasets are NOT a problem, then why do we have an entire sklearn python library (imblearn) dedicated to these problems and techniques? Why are these techniques so popular in learning resources? Do they work at all? Why? I can't seem to find enough evidence to any side of this matter.

  • 1
    I've been working on this, in the experiments I have performed using (kernel) logistic regression it mostly doesn't work. I suspect that balancing is overused (because there are packages and a profusion of blogs saying that you should balance the dataset) and we generally only hear about the cases where it works (or where the evaluation is performed badly), so there is some observer bias. There is a class imbalance problem, but it goes away as the overall size of the dataset increases and only affects *very* small datasets. – Dikran Marsupial Nov 01 '21 at 05:10
  • 1
    @DikranMarsupial That looks like an answer to me! – Dave Nov 01 '21 at 10:32
  • 1
    @Dave at the moment it is just opinion (at least until I have finished the study), but I am working on a proper answer for this question in the long term! ;o) – Dikran Marsupial Nov 01 '21 at 10:47
  • 1
    @DikranMarsupial Right, I've read similar opinions. But I can't seem to find evidence to any side of this matter, which is just what I said in my question. – eduardokapp Nov 01 '21 at 14:36
  • 3
    @eduardokapp It appears that nobody can give a test for whether a dataset/classifier exhibits the class imbalance problem (even when a bounty was on offer) https://stats.stackexchange.com/questions/539638/how-do-you-know-that-your-classifier-is-suffering-from-class-imbalance. There is clearly something wrong if so many sources recommend fixing a problem that can't even be diagnosed! I'm hoping to be able to provide something more substantive in the medium term. – Dikran Marsupial Nov 01 '21 at 14:50
  • 1
    One should keep in mind that [the AUC is only a semi-proper scoring rule](https://stats.stackexchange.com/q/339919/1352), so whether under/oversampling improves it does not necessarily mean we are better off - we may have gotten a better AUC by biasing our probabilistic classifications. (I haven't dug through the link you provide.) [Are unbalanced datasets problematic, and (how) does oversampling (purport to) help?](https://stats.stackexchange.com/q/357466/1352), and note the discussion in the comments about "why then is balancing so often taught?" – Stephan Kolassa Nov 01 '21 at 18:11
  • My example shows that a better Brier score does not necessarily mean we are better off (in circumstances where the [weighted] accuracy is the quantity of primary interest for the application. – Dikran Marsupial Nov 03 '21 at 05:40

0 Answers0