0

Say you're training a classifier to take an input $x$ and predict its label $y \in \{1,\ldots,k\}$.

As an example, let's say the classifier is a neural net, which ends in a softmax layer, and we train it using gradient descent to maximize the cross-entropy (average log probability assigned to the correct label).

My question is: if your training set isn't uniformly distributed across all $k$ labels, then that bias will get built into your classifier. What are some ways I can modify things so that the classifier is forced to have a uniform prior on all $k$ labels? It'd be nice if you could mention the drawbacks for those fixes as well.

To my knowledge, 2 big ways to address this are:

  • Train on samples from each label equally. E.g. if you have 200 samples with label 1, and 400 with label 2, then you train on label 1 samples twice as often.
  • Artificially generate more samples from labels you have less of.

But both of these don't seem ideal to me. The first method doesn't seem like an optimally efficient use of data, and the second method seems highly finicky depending on your ability to generate samples with a label.

One idea I have is: on the output layer, before the softmax: force the bias to be uniform ($b=-\log(k) \mathbf{1}_k$), since it's the bias that controls the prior in e.g. logistic regression. Would this work/not throw off the training of the neural net?

chausies
  • 379
  • 1
  • 11
  • 1
    Good news! Class imbalance is not a problem! https://stats.stackexchange.com/questions/357466/are-unbalanced-datasets-problematic-and-how-does-oversampling-purport-to-he https://www.fharrell.com/post/class-damage/ https://www.fharrell.com/post/classification/ https://stats.stackexchange.com/a/359936/247274 https://stats.stackexchange.com/questions/464636/proper-scoring-rule-when-there-is-a-decision-to-make-e-g-spam-vs-ham-email https://twitter.com/f2harrell/status/1062424969366462473?lang=en – Dave Aug 20 '21 at 21:38

0 Answers0