I’m working on a classification problem where dataset is extremely imbalanced ( roughly 13000 "zero" and 100 "one" responses).
As the first step, I trained a Logistic Regression and changing the cutoff probability, managed to predict most of the “one” responses correctly, but a reasonable number of “zero” responses were incorrectly classified as “one”.
So I would like to know that, what are the good algorithm which can properly handle imbalance datasets?
Thanks,
P.S. I’m looking at algorithms which are available in scikit-learn or as a R package.