1

unbalanced data is an issue that can effect the performnce of classification model ,several remides can be done to balance the data two of them are upsampling and downsampling , my questions is :

  • how do you know which method is the best for your model ?
  • is it true that we need to compare f1-score rather than accuracy score ?
  • do we also need to conduct a model for the unbalnced data and compare it to the upsampled/downsampled models

PS: doing cross validation to check al of them will take a lot of time especially with data that have large observations and complex models

ayoub
  • 31
  • 1

1 Answers1

0

You may try stratified k-fold cross-validation instead of the classic approach of k-fold cross-validation. In stratified k-fold cross-validation, the folds are selected so that the mean response value is approximately equal in all the folds.

Particularly, Stratification is the process of re-arranging the data as to ensure each fold is a good representative of the whole.

F1-score is the appropriate metric of performance instead of considering Accuracy.