0

I'm working with a dataset containing several classes. The largest class has over 500 samples, and the smallest classes have fewer than 10 samples. I know that you should perform upsampling inside the cross-validation loop to prevent data-leakage between folds... however this results in awful balanced-accuracy because there are so few samples for those tiny classes.

Would it be valid to perform a tiny amount of upsampling using a technique like SMOTE to lets say increase the number of samples in the smallest classes to something like 20 samples prior to cross validation and them perform further upsampling with SMOTE normally inside the cross-validation loop?

Avelina
  • 809
  • 1
  • 12
  • 2
    Unbalanced classes are almost certainly not a problem, and oversampling will not solve a non-problem: [Are unbalanced datasets problematic, and (how) does oversampling (purport to) help?](https://stats.stackexchange.com/q/357466/1352) – Stephan Kolassa Mar 31 '21 at 14:39
  • 2
    Do not use accuracy to evaluate a classifier: [Why is accuracy not the best measure for assessing classification models?](https://stats.stackexchange.com/q/312780/1352) [Is accuracy an improper scoring rule in a binary classification setting?](https://stats.stackexchange.com/q/359909/1352) [Classification probability threshold](https://stats.stackexchange.com/q/312119/1352) – Stephan Kolassa Mar 31 '21 at 14:39
  • 1
    @StephanKolassa I hope you're in the process of making a bot to automate your response to class imbalance questions! But since it is relevant, can you confirm specifically that SMOTE is not useful? Or are there cases where it is? I can't figure out what SMOTE's utility is (https://stats.stackexchange.com/questions/285231/what-problem-does-oversampling-undersampling-and-smote-solve) but likewise have not found a clear statement that it is actually useless. Additionally, it would be surprising to me that a commonly known and utilized algorithm actually serves no constructive purpose. – Ryan Volpi Mar 31 '21 at 15:15
  • 2
    @RyanVolpi: that bot would indeed be an interesting idea... to be honest, I have not looked into SMOTE specifically, simply because I am indeed convinced that it is an attempt to solve a non-problem. If you find interesting evidence against this preconception of mine, please do tell me. In the meantime, unfortunately it is not at all uncommon for a commonly known and utilized theory to serve no constructive purpose. I'll start calling upsampling the [humor theory](https://en.wikipedia.org/wiki/Humorism) of classification. Let's hope it won't take 2000 years to fall out of favor. – Stephan Kolassa Mar 31 '21 at 15:21
  • @StephanKolassa how would I then train a classifier which has no notion of class weighting? (e.g. MLPClassifier in sklearn?) I was under the impression that resampling in cases like these was the only way to reduce the bias towards certain classes in the classifier. – Avelina Mar 31 '21 at 15:39
  • 2
    I would recommend (see the threads above) not using a classifier that outputs a hard classification, but using probabilistic classifications. And evaluating these not using accuracy (which also necessitates a threshold, which is another problem, see the threshold thread above), but proper scoring rules. Per the very first question linked above, there is no bias problem any more - just possibly very small predicted class memberships, which is precisely as it should be. – Stephan Kolassa Mar 31 '21 at 15:45
  • 1
    @StephanKolassa right so i've switched to ROC AUC for scoring and I've developed a thresholding function to scale the probabilities predicted from the network, and then 'learn' the coefficients for thresholding too by trying to maximise ROC AUC for the highest average across all folds. Is this on the right track? It has actually improved things a little. Not massively, but that just means my network hyperparams might need improvement too. Thank you! I wish I could upvote comments but my acocunt is too new. – Avelina Apr 02 '21 at 12:16

0 Answers0