I'm currently working on a project which uses a imbalanced dataset (two classes) for training, and I'm not sure if I should do a resampling procedure or not. Is there a way to actually test if it's necessary to resample a dataset to fix the imbalance?
Asked
Active
Viewed 44 times
0
-
Also see [When is unbalanced data really a problem in Machine Learning?](https://stats.stackexchange.com/questions/283170/when-is-unbalanced-data-really-a-problem-in-machine-learning). – kjetil b halvorsen Dec 08 '18 at 16:12
1 Answers
1
If you are going to use a method that doesn't work well when there is class imbalance, you are choosing the wrong method. Subsampling from good data, thus removing good observations from the analysis, is an atrocious solution that violates key statistical principles, and is a symptom of a method's poor statistical properties. More detail may be found here.

Frank Harrell
- 74,029
- 5
- 148
- 322