Having trouble balancing the recall and precision of my XGBoost model

Question

The title essentially says it all. Below are some details regarding my data and model.

This is the current class distribution within my training set:

0    1353849
1      26217
Name: binary, dtype: int64

My training set includes 104 features.

My current recall is at 94%; My current precision is at 20%

Here are the hyperparameters for my XGBoost model:

nrounds = 500, eta = 0.2, max_depth = 20, subsample = 0.8, colsample_bytree = 0.2,reg_alpha=0.1, reg_lambda=0.8

I've tried SMOTE but it isn't working well likely cause of the high dimensionality. If you all have any recommendations, that would be much appreciated.

Given that only about 2% of your sample is $1$, a recall of 94% and precision of 20% don't seem bad to me, but of course that depends on your domain. Otherwise, I'd reduce my max_depth to maybe 5, my subsample to maybe 0.5, my eta to maybe 0.02... — jbowman, Feb 27 '18 at 19:04
Is there a way to extract probabilities instead of classifications? Then set a threshold manually — ChootsMagoots, Feb 27 '18 at 19:19
This question is effectively unanswerable, as it requires in depth knowledge of your data. The following are suggestions: 1) Are you looking at a single point of precision/recall? It is in fact a curve, depending on the threshold you choose. 2) Have you tried hyperparameter tuning?: https://stats.stackexchange.com/questions/171043/how-to-tune-hyperparameters-of-xgboost-trees — Alex R., Feb 27 '18 at 19:40

Having trouble balancing the recall and precision of my XGBoost model

0 Answers0