0

I am working with an imbalanced data: 70k:0 and 1K:1 with 12 features. I would like to perform classification to choose the important features. So far, I have done under-sampling, over-sampling, hybrid (over and then under-sampling), SMOTE, but performance metrics are terrible. I have tuned my model as well for decision trees and random forest. In case of SMOTE, here are my results. What should I do to improve the performance?

f1-score = 0.06
precision = 0.03
recall = 0.48
accuracy = 0.77
AUC = 0.70
Confusion matrix:
[[13685 4082]
 [134    125]
ricecooker
  • 11
  • 2
  • https://stats.stackexchange.com/questions/357466/are-unbalanced-datasets-problematic-and-how-does-oversampling-purport-to-he https://www.fharrell.com/post/class-damage/ https://www.fharrell.com/post/classification/ https://stats.stackexchange.com/questions/359909/is-accuracy-an-improper-scoring-rule-in-a-binary-classification-setting/359936#359936 https://stats.stackexchange.com/questions/464636/proper-scoring-rule-when-there-is-a-decision-to-make-e-g-spam-vs-ham-email https://twitter.com/f2harrell/status/1062424969366462473?lang=en – Dave Mar 24 '21 at 14:31
  • Please see [this discussion](https://stats.stackexchange.com/q/283170/28500) about imbalanced data, [this discussion](https://stats.stackexchange.com/q/285231/28500) about SMOTE and similar approaches, and [this discussion](https://stats.stackexchange.com/q/312780/28500) about why you should consider different performance metrics. – EdM Mar 24 '21 at 14:32
  • Thant you! Could the low performance be because my X features are nonsense? – ricecooker Mar 24 '21 at 15:28

0 Answers0