Deal with imbalanced data

Question

Building machine learning models to do forecast, sometimes the dataset was used is imbalanced, and there are some methods to deal with this issue such as the resample method and choose other metrics(for example, recall rate) rather than accuracy. So can these two methods be used at the same time? Or just one method would be fine to solve this imbalanced data issue. Are there any pros and cons of each method?

You can also use a probabilistic classifier (basically anything using logloss as loss function): logistic regression, xgboost, neural networks. They don't give just the class, but the probability of class, and then imbalance is irrelevant...it's just another probability value. — seanv507, Dec 27 '19 at 10:24
Many dups on this site! https://stats.stackexchange.com/questions/6067/does-an-unbalanced-sample-matter-when-doing-logistic-regression, https://stats.stackexchange.com/questions/283170/when-is-unbalanced-data-really-a-problem-in-machine-learning, https://stats.stackexchange.com/questions/235808/binary-classification-with-strongly-unbalanced-classes, https://stats.stackexchange.com/questions/247871/what-is-the-root-cause-of-the-class-imbalance-problem, https://stats.stackexchange.com/questions/312780/why-is-accuracy-not-the-best-measure-for-assessing-classification-models — kjetil b halvorsen, Dec 27 '19 at 12:24

Deal with imbalanced data

0 Answers0