I'm using Keras on top of Theano for neural network training. What should be my batch size in relation to the number of classes? I have 560 classes and if I use a batch size more than 128, I can't train as it does not fit in memory. Would it help to have the batch size greater than the number of classes, say,
batch size = 3 * number of classes?
That way each batch has at least a few images from each class. I do understand there is a randomness in selecting data points for each batch.