XGBoost Calibration

Question

I have an imbalanced dataset and am using XGBoost to create a predictive model.

    xgb = XGBClassifier(scale_pos_weight = 10, reg_alpha = 1)

Although my recall and specificity are acceptable, I would like to improve the calibration curve. When I try using isotonic regression to calibrate my model, my predictive performance (recall and specificity) decrease dramatically. Any thoughts on how to maintain my metrics while improving calibration.

    calibrated = CalibratedClassifierCV(xgb, X_train, y_train, cv=5)
    calibrated.fit(X_train, y_train)
    scores = cross_val_score(calibrated, X_test, y_test, cv = 5, scoring ='recall')
    print(np.mean(scores)) 
    #prints 0.17
    
    xgb.fit(X_train, y_train)
    scores = cross_val_score(xgb, X_test, y_test, cv = 5, scoring ='recall')
    print(np.mean(scores)) 
    #prints 0.74

Thanks in advance!

Does [this answer](https://stats.stackexchange.com/a/263411/87362) help? — runr, Mar 05 '21 at 22:51
Or [this](https://stackoverflow.com/questions/30285551/why-does-calibratedclassifiercv-underperform-a-direct-classifer) — runr, Mar 05 '21 at 23:02
@runr thank you for these comments, it helps but i don’t know why my classification accuracy, specially sensitivity decreases after calibration. is this just a side effect of calibration? — arjunv0101, Mar 06 '21 at 18:49
It's possible, especially if the estimator variance is significant during different folds. You could also try not retraining the models, but just passing the probabilities (there was an example in some answer). The performance should at least be somewhat similar.. Also, ``SplineCalibratedClassifierCV`` method could maybe be useful, from [this answer](https://stackoverflow.com/a/41603531/3629151), I will need to try this myself, since it looks promising — runr, Mar 06 '21 at 20:14

score 2 · Answer 1 · answered Mar 06 '21 at 21:07

Because of the weighting, your model predicts probabilities that are uniformly too large. Since you use the default cutoff probability of 0.5, you naturally get high recall (but you should get relatively low specificity; your text suggests maybe that's not the case, but the code snippet doesn't reference specificity). When you calibrate, those probabilities are brought down to a more correct range, and so you predict (with cutoff 0.5) far fewer examples as the positive class, and hence a lower recall.

Calibration applies a monotone function to the original model's outputs, so the recall and specificity (and so on) are the same for different cutoffs. Do some cutoff optimization and you should recover more-or-less the same performances. (CalibratedClassifierCV by default ends up as an ensemble model, so things will change a little.)

Best to use a proper scoring rule, or at least something that isn't cutoff-dependent.

XGBoost Calibration

1 Answers1