4

This is less a question about sklearn's implementation, and more theoretical. I find it weird that we'd do isotonic regression against target values in {0, 1} because that could result in very jagged results. Why not use the calibration curve to do the calibration?

So to give you an example, I had a binary classification problem which was imbalanced towards 95% probability of a positive. I trained it on rebalanced data and tried validating it on unbalanced data. Of course that didn't turn out so great, so I went for calibration via isotonic regression.

So here's the "normal" way of doing it:

def predict_via_isotonic_calibration(y_true, y_prob):
    """
    y_true is an array of binary targets
    y_prob is an array of predicted probabilities from an uncalibrated classifier
    """
    iso_reg = IsotonicRegression(out_of_bounds='clip').fit(y_prob, y_true)
    calibrated_y_prob = iso_reg.predict(y_prob)
    return calibrated_y_prob

Which gave me this (calibrated vs uncalibrated):

enter image description here

Whereas I think it should be more like:

def predict_via_isotonic_calibration(y_true, y_prob, n_bins=80):
    y, x = calibration_curve(y_true, y_prob, n_bins=n_bins)
    iso_reg = IsotonicRegression(out_of_bounds='clip').fit(x, y)
    calibrated_y_prob = iso_reg.predict(y_prob)
    return calibrated_y_prob

Which gave me this much nicer calibration:

enter image description here

So what gives? Is my idea a thing? Or is it wrong for some reason I'm overlooking.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Alexander Soare
  • 549
  • 2
  • 11
  • 1
    I wonder whether these plots aren't a little misleading; especially in the first one, getting an average of 0 suggests that there are very few datapoints in those bins. Binning by quantiles instead of equal-width might give a clearer view? – Ben Reiniger Nov 13 '20 at 03:27
  • @BenReiniger You're exactly right. I'm trying to question whether it makes more sense to bin at all. After examining the problem more I convinced myself that the real way to do it is the right way. My objection was originally that doing regression against {0, 1} could result in jagged results. Sure the calibration curve looks jagged, but like you say, sometimes there are very few data points in a bin - but that's just because the dataset is very unbalanced. The jagged bits _look_ bad, but they really aren't, because they are just artifacts of binning. – Alexander Soare Nov 13 '20 at 08:52
  • In fact, even though the bottom calibration _looks_ better its roc_auc_score is slightly lower. – Alexander Soare Nov 13 '20 at 08:53

1 Answers1

2

After examining the problem more I convinced myself that the official way to do it is the right way. My objection was originally that doing regression against {0, 1} could result in jagged results. But actually, that's the basis for logistics regression! Isotonic regression is not fundamentally different in that sense. Here's a bad drawing to explain why this is okay

enter image description here

In the same way, neither logistic regression, nor isotonic regression will have any problem with regressing to a set containing just 0s and 1s.

As for why my first graph looks "jagged"

enter image description here

As a commenter made me realise, it's just an artifact of uniform binning combined with the fact that the dataset is imbalanced towards 0.95. Some bins just happened to have a 1 point in them and that point was predicted wrong even after calibration.

So really, me trying to use bucketed rates instead of the actual target values is just a way of interfering with the intended function of isotonic regression. In fact, it turns out that the roc_auc score is still better for the "official" recalibration method vs my proposed one.

Alexander Soare
  • 549
  • 2
  • 11
  • 2
    Calibration generally doesn't affect the rank-ordering, so roc_auc will generally be the same after calibration. Isotonic regression is piecewise-constant though, so you _do_ affect the rank-ordering in that you've lumped some values together, and so the ROC curve is a little more coarse. – Ben Reiniger Nov 13 '20 at 15:05