What is the intuition behind what makes dice coefficient handle imbalanced data?

Question

I am writing my master thesis right now doing a project in deep learning doing semantic segmentation of MRI-images. Me and my partner have been looking at using dice loss instead of categorical cross-entropy. Because it is stated in a couple of papers that you might get better results on the segmentation task.

In the thread Dice-coefficient loss function vs cross-entropy It is however stated that this is not necessarily true and that one has to test this statement empirically.

I have been staring at the equation for dice loss for quite some time now

And I do not understand why "one does not have to assign weights to samples of different samples to establish the right balance" or "In addition, Dice coefficient performs better at class imbalanced problems by design"

If anyone could help me getting a better intuition why dice loss is better than cross-entropy for class imbalanced problems I would be super happy.

Just as an extra in this paper they introduced a "generalized dice loss" where each class is scaled with a weight parameter which is inversely proportional to the number of voxel belonging to this class. In this case I absolutely understand how this combats class imbalance. https://arxiv.org/pdf/1707.03237.pdf

score 8 · Accepted Answer · answered Nov 29 '19 at 13:19

Dice score measures the relative overlap between the prediction and the ground truth (intersection over union). It has the same value for small and large objects both: Did you guess a half of the object correctly? Great, your loss is 1/2. I don't care if the object was 10 or 1000 pixels large.

On the other hand, cross-entropy is evaluated on individual pixels, so large objects contribute more to it than small ones, which is why it requires additional weighting to avoid ignoring minority classes.

A problem with dice is that it can have high variance. Getting a single pixel wrong in a tiny object can have the same effect as missing nearly a whole large object, thus the loss becomes highly dependent on the current batch. I don't know details about the generalized dice, but I assume it helps fighting this problem.

Thank you for the answer. I think the thing that made me confused is that since our problem is not a binary classification task, and we are treating the background(which is the overrepresented class in our case) as one out the total four classes when one computes a dice score on all classes and then take an average on all four we do not get the benefits of handling class imbalance but when one computes the overlap on one specific class the benefits becomes quite obvious. Am I thinking about it correctly? — Laaggan, Nov 30 '19 at 09:19

score 2 · Answer 2 · answered Apr 01 '20 at 00:06

Dealing with a two-class problem is intuitive, it calculates the overlap between foreground regions (while background is not of interest). In a multi-class scenario, when provided with probability maps on one end and one-hot encoded labels on the other, it effectively performs multiple two-class problems. In practice, one would calculate a vector of dice scores per class, take its mean and return 1-mean. (I have attached implementation in the latest source code of DeepMedic for reference)

Here, scores for each class are calculated independently of their relative sizes and hence contribute fairly to the mean score.

What is the intuition behind what makes dice coefficient handle imbalanced data?

2 Answers2