Problem of unbalanced data

Question

unbalanced data is an issue that can effect the performnce of classification model ,several remides can be done to balance the data two of them are upsampling and downsampling , my questions is :

how do you know which method is the best for your model ?
is it true that we need to compare f1-score rather than accuracy score ?
do we also need to conduct a model for the unbalnced data and compare it to the upsampled/downsampled models

PS: doing cross validation to check al of them will take a lot of time especially with data that have large observations and complex models

score 0 · Answer 1 · answered Jan 20 '20 at 14:15

You may try stratified k-fold cross-validation instead of the classic approach of k-fold cross-validation. In stratified k-fold cross-validation, the folds are selected so that the mean response value is approximately equal in all the folds.

Particularly, Stratification is the process of re-arranging the data as to ensure each fold is a good representative of the whole.

F1-score is the appropriate metric of performance instead of considering Accuracy.

Problem of unbalanced data

1 Answers1