Predicted Probability with XGBClassifier ranging only from 0.48 to 0.51 for either class

Question

Why does my XGBClassifier predicts probability only from 0.48 to 0.51 for either class?

I'm very new to XGBoost, so any suggestions are greatly appreciated! Here's what I want to do using python:

I have a binary classification problem
I want to get predicted probability for thresholding (so I want predict_proba())
Based on what I've read, XGBClassifier supports predict_proba(), so that's what I'm using

However, after I trained the model (hyperparameters at the end of the post), when I use model.predict_proba(val_X), the output only ranges from 0.48 to 0.51 for either class. Something like this:

I've read a few other posts like "xgboost logistic regression predictions are returning values >1 and < 0", and "How does gradient boosting calculate probability estimates?", but I can't figure out if the predict_proba() is giving me log odds or predicted probability, or what exactly is happening.

An answer to this post "Unexpected probability distribution from xgboost binary classification" suggests that the model may not be learning anything from the data, and therefore the random probabilities. My roc-auc is 0.7662914691943129 and although I have class imbalance issue (80% of positive class), I've used the balanced class weights (n_samples / (n_classes * np.bincount(y))) during model training.

Here are the parameters for my model (I've used optuna to tune my hyperparameters):

 {'objective': 'binary:logistic',
 'use_label_encoder': False,
 'base_score': 0.5,
 'booster': 'gbtree',
 'colsample_bylevel': 1,
 'colsample_bynode': 1,
 'colsample_bytree': 1,
 'enable_categorical': False,
 'gamma': 0.20869504071834755,
 'gpu_id': -1,
 'importance_type': None,
 'interaction_constraints': '',
 'learning_rate': 9.48345478e-05,
 'max_delta_step': 0,
 'max_depth': 9,
 'min_child_weight': 1,
 'missing': nan,
 'monotone_constraints': '()',
 'n_estimators': 15,
 'n_jobs': 8,
 'num_parallel_tree': 1,
 'predictor': 'auto',
 'random_state': 0,
 'reg_alpha': 0.980300725,
 'reg_lambda': 0.00221248553,
 'scale_pos_weight': 0.24369747899159663,
 'subsample': 1,
 'tree_method': 'exact',
 'validate_parameters': 1,
 'verbosity': 0,
 'eval_metric': ['auc', 'logloss'],
 'lambda': 0.002212485584996869,
 'alpha': 0.980300751529644,
 'eta': 9.483455063850674e-05,
 'grow_policy': 'depthwise'}

Did your model perhaps fit only a small number of trees? A constant (or close to constant) prediction is correct -- it's the optimal thing to do -- in problems where X is not very predictive of Y. If you don't have much signal, your predictions will be close to constant. — Adrian, Feb 17 '22 at 23:10
Thanks Adrian! I think that's what's happening! I may have too much regularization and it prevented the model from learning the signal. — yuan-ning, Feb 18 '22 at 20:27

Predicted Probability with XGBClassifier ranging only from 0.48 to 0.51 for either class

0 Answers0