I have a very imbalanced dataset with the ratio of the positive samples to the negative samples being 1:496. The scoring metric is the f1 score,and my desired model is LightGBM. I am using the sklearn implementation of LightGBM. I have read the docs on the class_weight parameter in LightGBM:
class_weight : dict, 'balanced' or None, optional (default=None) Weights associated with classes in the form
{class_label: weight}
. Use this parameter only for multi-class classification task; for binary classification task you may useis_unbalance
orscale_pos_weight
parameters. The 'balanced' mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data asn_samples / (n_classes * np.bincount(y))
. If None, all classes are supposed to have weight one. Note that these weights will be multiplied withsample_weight
(passed through the fit method) ifsample_weight
is specified.
On using the class_weight parameter on my dataset, which is a binary classification problem, I got a much better score(0.7899) than when I used the recommended scale_pos_weight parameter(0.2388). Should I use the class_weight parameter or the scale_pos_weight parameter to balance the classes?