I have an extremely imbalanced dataset (millions of times more negatives) for a binary classification NN model. I am aggressively downsampling solely for the purpose of making training time manageable, (not to be confused with downsampling in order to bias the model, make accuracy easier to interpret, etc. - I can fix these problems separately by adjusting the classification threshold). In other words, an unbiased sample of my data would require tens of millions of obs just to get a few positives - not ideal.
My understanding is that once you do this majority class downsampling, you are supposed to weight the loss function in order to "calibrate" probabilities. In other words, I am feeding in roughly balanced data, so the average NN prediction will be in the 0.5 ballpark. However, since the actual positivity rate is 0.0...01, the output probabilities should generally be much lower. This article describes this as upweighting, or calibrating the downsample.
To do this, I am using the class_weight
argument of the tf.keras (tf 2) model.fit
step, set to {False: 1/downsample_rate (very big number), True: 1}
.
Excluding this argument, I get a very good model, high ROC AUC, except that the predicted probabilities are way too big (the NN thinks the data is balanced). Upon adding this argument, my ROC performance drops dramatically, and probabilities are still pretty big. My understanding is that the ROC isn't affected by weighting classes (since both axes are normalized), or simply scaling probabilities (it's based on rank of preds only), so it seems the model is actually getting worse.
Any thoughts or suggestions? Why is this happening, am I taking the correct approach, and is there a better way to "calibrate" probabilities in a NN after downsampling?
Note: I tried this same approach on a random forest, and I also (surprisingly) saw a small decrease in AUC by weighting the loss, but the difference was far less dramatic.