Building machine learning models to do forecast, sometimes the dataset was used is imbalanced, and there are some methods to deal with this issue such as the resample method and choose other metrics(for example, recall rate) rather than accuracy. So can these two methods be used at the same time? Or just one method would be fine to solve this imbalanced data issue. Are there any pros and cons of each method?
Asked
Active
Viewed 26 times
0
-
1You can also use a probabilistic classifier (basically anything using logloss as loss function): logistic regression, xgboost, neural networks. They don't give just the class, but the probability of class, and then imbalance is irrelevant...it's just another probability value. – seanv507 Dec 27 '19 at 10:24
-
1Many dups on this site! https://stats.stackexchange.com/questions/6067/does-an-unbalanced-sample-matter-when-doing-logistic-regression, https://stats.stackexchange.com/questions/283170/when-is-unbalanced-data-really-a-problem-in-machine-learning, https://stats.stackexchange.com/questions/235808/binary-classification-with-strongly-unbalanced-classes, https://stats.stackexchange.com/questions/247871/what-is-the-root-cause-of-the-class-imbalance-problem, https://stats.stackexchange.com/questions/312780/why-is-accuracy-not-the-best-measure-for-assessing-classification-models – kjetil b halvorsen Dec 27 '19 at 12:24