In a speaker recognition problem I have 330 speakers (classes) as targets and want to predict the identities with a feedforward neural net with a softmax output layer.
The thing is some classes have as much as 10 times more training data available compared to the smaller classes. If I don't do any balancing of the classes the results are kind of bad cause the big classes tend to dominate and the smaller classes are often misclassified.
One easy solution I can use is to throw away lots of training data for the bigger classes in order to have a balanced dataset and speed up the training a bit. It kind of works but it seems to be very sub-optimal.
Maybe I could try replicating some samples from the smaller classes to have as much as the bigger class? This would lead to slower training but at least I wouldn´t throw away real training data.
I was wondering, is there a more elegant way of weighting the importance of the classes during training or something like that? In order to get a better accuracy without having the bigger classes "eating" all the small classes? It would be great if there was a method to do so without having to upsample or downsample the training set of individual classes, but keeping an unbalanced set as an input.