Let's say I have a dataset where we have examples belonging to either class A or class B but we have more examples belonging to class B than A.
If I train a classifier on this data will it be useless?
Let's say I have a dataset where we have examples belonging to either class A or class B but we have more examples belonging to class B than A.
If I train a classifier on this data will it be useless?
Training a classifier on data in which there is a large class imbalance can lead to problematic behaviour from the classifier. For example, using the popular classification rule of $p<0.5 \implies 0$ else 1 in logistic regression for an imbalanced classification can lead to all the classifications being 0/1 (depending on which class is most prevalent).
In this case, it might be better to estimate the probability that each observation belongs to each class. This gives you the added flexibility of creating your own cutoff should you want it, or better yet using the probability directly in any downstream decisions.