0

I was asked for the question that which classification loss function is relatively not sensitive to the imbalanced sample (tree, regression, e.t.c.)?

I know that imbalanced sample will affect the accuracy including recall, ROC, AUC e.t.c. And usually we will use re-sampling (undersampling and oversampling) to pre-process the imbalanced data. But I don't which classifier is relatively not sensitive to the imbalanced sample.

user6703592
  • 745
  • 3
  • 8
  • 2
    Class imbalance almost certainly is not a problem, and there is no need to use undersampling or oversampling to solve a non-problem. https://stats.stackexchange.com/questions/357466/are-unbalanced-datasets-problematic-and-how-does-oversampling-purport-to-he https://www.fharrell.com/post/class-damage/ https://www.fharrell.com/post/classification/ https://stats.stackexchange.com/a/359936/247274 https://stats.stackexchange.com/questions/464636/proper-scoring-rule-when-there-is-a-decision-to-make-e-g-spam-vs-ham-email https://twitter.com/f2harrell/status/1062424969366462473?lang=en – Dave Nov 14 '21 at 13:35
  • @Dave Actually I am not sure the meaning of `classification loss function` here, did interviewer just simply want me to talk something like accuracy, ROC do not perform well under imbalanced samples rather than let me state the classifier not sensitive to imbalance? – user6703592 Nov 14 '21 at 13:46
  • 1
    I would discuss the reasons why class imbalance is not such a problem, which is explained quite well in the first link I gave. // What does it even mean to be “sensitive” to class imbalance? – Dave Nov 14 '21 at 13:53
  • @Dave from your first link, I understand as 1. imbalance will not affect the training, even the large variance of estimation for minority class is because of lack of samples rather than the imbalance between minority class and majority class. 2. imbalance will really affect the criteria like accuracy, however accuracy itself is not a perfect criteria for all the classifiers. Pls correct me, if something is misunderstand. – user6703592 Nov 14 '21 at 14:10
  • 1
    I have yet to figure out what it means to "affect" the training or what "sensitivity' to class imbalance means. Yes, the data influence what you're doing and the results you will get. // Accuracy is problematic in imbalance because you could get an impressive-looking $98\%$ accuracy, even though guessing everything to be a member of the majority class outperforms your model by giving $99\%$ accuracy. Kolassa gets into why accuracy is a problem for perfectly balanced data sets, too, however. – Dave Nov 14 '21 at 14:22

0 Answers0