0

I am working on a research paper concerning developing a CNN model for a multi-class classification on images. I have a large dataset consisting of 3 classes summing up to 20000 images. Class 1 has 7000 images, Class 2 has 10000 images and Class 3 has 3000 images. Can I subsample Class 1 and Class 2 down to 3000 images both, so I can have 3000 images for each class? I feel like oversampling can increase training time a lot.

Note: I have constraints on resources because I don't have a powerful machine. I only use the free version of Google Colab. I am willing to get the Colab Pro however I'm not from US or Canada.

djbacs
  • 51
  • 2

1 Answers1

1

Simply sample from your original data, without weighting or over-/undersampling. Unbalanced classes are almost certainly not a problem, and oversampling will not solve a non-problem: Are unbalanced datasets problematic, and (how) does oversampling (purport to) help?

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357