4

In a speaker recognition problem I have 330 speakers (classes) as targets and want to predict the identities with a feedforward neural net with a softmax output layer.

The thing is some classes have as much as 10 times more training data available compared to the smaller classes. If I don't do any balancing of the classes the results are kind of bad cause the big classes tend to dominate and the smaller classes are often misclassified.

One easy solution I can use is to throw away lots of training data for the bigger classes in order to have a balanced dataset and speed up the training a bit. It kind of works but it seems to be very sub-optimal.

Maybe I could try replicating some samples from the smaller classes to have as much as the bigger class? This would lead to slower training but at least I wouldn´t throw away real training data.

I was wondering, is there a more elegant way of weighting the importance of the classes during training or something like that? In order to get a better accuracy without having the bigger classes "eating" all the small classes? It would be great if there was a method to do so without having to upsample or downsample the training set of individual classes, but keeping an unbalanced set as an input.

Steve3nto
  • 380
  • 3
  • 12

1 Answers1

2

You can use different weights for different samples or categories when computing the cost during training, (say using a higher cost for uncommon samples/categories than the others).

Such is called the cost-sensitive method according to this paper, and there're also many other methods mentioned in it. I first found the paper here in a similar question.

There's also an question (unanswered at the time of writing) Tuning priors/weights/costs to counteract class imbalance that might be related.

dontloo
  • 13,692
  • 7
  • 51
  • 80
  • 1
    This very recent paper seems to claim that weights do nothing for deep neural networks (I didn't read it carefully yet) https://arxiv.org/abs/1812.03372 – Tim Dec 12 '18 at 11:56
  • Me neither, the paper appeared on arXiv four days ago :) – Tim Dec 12 '18 at 13:38