Recently, I am digging into the selection of tuning parameters in a binary classification.
I gathered information by googling and the following is my organization.
- We can distinct binary classification problems into two cases
- Balanced classification: the proportion of the two classes is about 5:5
- Imbalanced classification: the proportion of the two classes are not balanced
- In imbalanced classification problems, "accuracy" is not a good metric for model performance
- Almost classification algorithms are developed considering the balanced classification, thus, if our data set is imbalanced the algorithm classifies almost observations as the dominate class.
- Therefore, although we cannot correctly classify the fewer proportion class, the accuracy is very high
- Hence, when we tune our tuning parameters by using cross-validation, the accuracy is not a good metric
My questions are as follows:
Is there any wrong part in my knowledge?
If my thought is not critically wrong, I think, we can use the accuracy to tune the parameters in the "balanced" classification problems. But, I am not sure whether that is correct idea
In every classification, we have to choose a probability threshold (= decision threshold (cutoff) = discriminant threshold (cutoff)). Considering the discussion above, it is not a good idea to use "accuracy" to select the threshold as a metric only in the imbalanced classification problems. I think, however, accuracy could be a good metric in the threshold selection in a balanced classification problem. Is this idea non-problematic?
Thank you for your time to read this long question.