I am building an accelerated time failure (AFT) model on a highly imbalanced data set 90% survival 10% death. I understand that we can not use Brier score because of the outcome imbalance, Brier score dose not fair well with imbalanced data, and I am looking at the C-index and D-calibration to help validate my model. I am struggling to understand how to interpret the D-calibration score. The following paper is the best resource I could find explaining D-calibration but I am still confused on interpreting the outcome. If the p value is close to significant, e.g., 0.052, what is that saying?
Effective Ways to Build and Evaluate Individual Survival Distributions pg.16 https://arxiv.org/pdf/1811.11347.pdf
Using this package to calculate the D-Calibration with the following output options
- statistic returns the chi squared test statistic
- pval returns the chi squared test p value
- max_deviation returns the maximum percentage deviation from the expected value, calculated as abs(expected_percentage - real_percentage), where expected_percentage = 1.0/n_bins
- histogram returns the full calibration histogram per bin
- all returns all of the above in a dictionary
https://loft-br.github.io/xgboost-survival-embeddings/modules/metrics.html