Hey there,
I have a question about a topic that's been discussed many times before, but to which I could find a satisfying answer.
I'm working with a self generated dataset that is comprised of only 340 'datapoints'. The dataset is not balanced, due to experimental reasons. It consists of eleven classes that vary from 60 events to 7 events per class. Bacause of the data origin, I cannot use common augmentation algorithms, so we developed our own. I trained a model on these data. It performs pretty well and is able to generalize the problem satisfactorily. I also tested different amounts of augmentation based on the resulting performance of the model.
My question now: Is it a good idea to use data augmentation to balance out the dataset, even though it produces a well performing model? Or do I acutally don't need a balanced dataset as long as my model performs to my satisfaction?
My concern is the integrity of my model. It is to be published as part of a larger project and I just want to make sure that it stands up to the review process.
I welcome any ideas and feedback on this topic.