XGBoost, Imbalanced Data and CalibratedClassifierCV

Question

I am currently working with a slightly imbalanced dataset (9% positive outcome) and am using XGBoost to train a predictive model.

    XGB = XGBClassifier(scale_pos_weight = 10)

Before calibration, my sensitivity and specificity are around 80%, but the calibration curve has slope 0.5

After calibration, the calibration curve looks great (slope = 0.995), but sensitivity and specificity decreased dramatically. Is a side effect of the calibration? Any thoughts on how to maintain my classification accuracy?

Thanks!

score 0 · Answer 1 · answered Mar 07 '21 at 06:27

Unbalanced classes are almost certainly not a problem: Are unbalanced datasets problematic, and (how) does oversampling (purport to) help?

Do not use accuracy to evaluate a classifier: Why is accuracy not the best measure for assessing classification models? Is accuracy an improper scoring rule in a binary classification setting? Classification probability threshold

The same problems apply to sensitivity and specificity, and indeed to all evaluation metrics that rely on hard classifications. Instead, use probabilistic classifications, and evaluate these using proper scoring-rules.

XGBoost, Imbalanced Data and CalibratedClassifierCV

1 Answers1