0

I have built a CNN model to classify positive and negative in my data, the accuracy is around 85% with FPR is 16%. I know the FPR is high but it gives an acceptable number of FP in training and testing of balanced dataset and I was happy with that. However, after that, I was given an extremely high imbalanced dataset to predict, where it has a number of negative 770 times higher than positive, therefore, the FP after predicting is extremely high compared to TP. Is there any way that I can do to reduce that FPR? I have thought of making another model to re-classify (TP and FP) in hope to rescure more FP.

Thank you

Anna
  • 1
  • 1
    Unbalanced classes are almost certainly not a problem, and oversampling or downsampling (testing a balanced set) will not solve a non-problem: [Are unbalanced datasets problematic, and (how) does oversampling (purport to) help?](https://stats.stackexchange.com/questions/357466/are-unbalanced-datasets-problematic-and-how-does-oversampling-purport-to-he) // If you *must* deal with the discrete categories, the CNN outputs a probability. Change the threshold for making a hard classification. // IMPORTANT: Why not call everything a negative case and have zero false positives? – Dave Jul 06 '21 at 17:46
  • Because the main purpose of the model is to detect positive case, if call everything a negative case, then don't need to use the model at all. – Anna Jul 06 '21 at 18:15
  • Then why not call everything a positive case? Do false positives and false negatives have different costs (one is worse than the other)? – Dave Jul 06 '21 at 18:16
  • Don't use accuracy, precision, recall, sensitivity, specificity, the F1 score, or any of the rates (FPR, TPR, ...). Every criticism at the following threads applies equally to all of these, and indeed to all evaluation metrics that rely on hard classifications: [Why is accuracy not the best measure for assessing classification models?](https://stats.stackexchange.com/q/312780/1352) [Is accuracy an improper scoring rule in a binary classification setting?](https://stats.stackexchange.com/q/359909/1352) [Classification probability threshold](https://stats.stackexchange.com/q/312119/1352) – Stephan Kolassa Jul 06 '21 at 18:42
  • Instead, use probabilistic classifications, and evaluate these using [proper scoring rules](https://stats.stackexchange.com/tags/scoring-rules/info). – Stephan Kolassa Jul 06 '21 at 18:42
  • Thanks Stephan, I'll look at it – Anna Jul 06 '21 at 18:47

0 Answers0