Difference between class_weight and scale_pos_weight in LightGBM

Question

I have a very imbalanced dataset with the ratio of the positive samples to the negative samples being 1:496. The scoring metric is the f1 score,and my desired model is LightGBM. I am using the sklearn implementation of LightGBM. I have read the docs on the class_weight parameter in LightGBM:

class_weight : dict, 'balanced' or None, optional (default=None) Weights associated with classes in the form {class_label: weight}. Use this parameter only for multi-class classification task; for binary classification task you may use is_unbalance or scale_pos_weight parameters. The 'balanced' mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)). If None, all classes are supposed to have weight one. Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.

On using the class_weight parameter on my dataset, which is a binary classification problem, I got a much better score(0.7899) than when I used the recommended scale_pos_weight parameter(0.2388). Should I use the class_weight parameter or the scale_pos_weight parameter to balance the classes?

Cross-posted at https://datascience.stackexchange.com/q/54043/55122 — Ben Reiniger, Feb 24 '20 at 00:55

Difference between class_weight and scale_pos_weight in LightGBM

0 Answers0

Linked