I fit random forest to my imbalanced dataset with minority class 1. I found that the AUC under the imbalanced data was better than that of re-sampled dataset (over/under sampling). Can someone help to put forward some theoretical explanation to justify this observation or please help to point out the underlying problem which caused this spurious output?
Asked
Active
Viewed 103 times
1
-
2Why did you think balancing your data would imporve performance? – Matthew Drury Jan 06 '17 at 02:28
-
When I faced imbalanced data, I searched google and re-sampling was commonly suggested as a strategy for this situation. So presumably it should work somehow. – LUSAQX Jan 06 '17 at 02:38
-
Perhaps your sampling technique is not maintaining the balance of the classes? – John Stud Feb 01 '21 at 01:05