I have a dataset where I'm training a neural network to discriminate between 4 classes. The class distribution on the dataset is as follows:
Class 0: 0.516
Class 1: 0.159
Class 2: 0.235
Class 3: 0.088
It's clear that class 0 is over-represented. Conversely, class 3 is almost non-existent in the dataset. These distributions are more or less preserved after a split into training, validation and test sets. In general, the network's performance is poor in terms of accuracy, even if it achieves relatively good performance on confusion matrix metrics (precision, recall and f1 score). I believe the network is overfitting class 0, as the training cost follows a downward trend, while validation and test errors are high. L2 regularization does not seem to help that much with the situation.
Do you have any suggestions on how to tackle this situation?