1

I am trying to train a neural network to classify chest X-ray scans as my final MSc project. I have a dataset of 13808 image, 3616 labelled COVID, 10192 labelled normal, so the ratio of COVID to normal images is 26.2/73.8. COVID is the positive class, Normal is the negative class. I am using keras to build a CNN and I am a bit overwhelmed by all the different metrics.

I have read that accuracy is a poor measure of performance, especially for imbalanced datasets, and that for medical imaging it is common to use sensitivity and specificity, as well as metrics like F1-score, AUC-ROC, and AUC-PR.

My reasoning is that minimizing false negatives, and therefore maximizing sensitivity/recall, is the priority in this context, as classifying someone as without COVID when they have it would cause the virus to spread. False positives are undesirable, as people would take unnecessary precautions, but not as important as minimizing false negatives.

I am a conversion student in computer science and so I am relatively new to machine learning and statistics. I would greatly appreciate any advice on how much of a problem the class imbalance is and what metrics would be most appropriate in this context. Thank you.

  • 1
    But you are aware of the common pitfalls with using machine learning for such cases..? For example, [this meta-analysis](https://www.nature.com/articles/s42256-021-00307-0) found that *all* such ML-based projects for classifying COVID were useless. – Tim Jul 17 '21 at 13:50
  • I don't really have a choice. It was assigned to me as my final project. It won't be used 'in the field', at least I don't think it will. It's just to demonstrate my ability to understand and solve a ML problem. – ParkTheMonkey Jul 17 '21 at 14:10
  • Sure, just though you may find it interesting & what problems to avoid. – Tim Jul 17 '21 at 14:23
  • I do. Thank you for the paper! I can include it in my dissertation when discussing problems. – ParkTheMonkey Jul 17 '21 at 14:30
  • Good news! Class imbalance is not a problem! https://stats.stackexchange.com/questions/357466/are-unbalanced-datasets-problematic-and-how-does-oversampling-purport-to-he https://www.fharrell.com/post/class-damage/ https://www.fharrell.com/post/classification/ https://stats.stackexchange.com/a/359936/247274 https://stats.stackexchange.com/questions/464636/proper-scoring-rule-when-there-is-a-decision-to-make-e-g-spam-vs-ham-email https://twitter.com/f2harrell/status/1062424969366462473?lang=en – Dave Jul 17 '21 at 15:04
  • But maybe more importantly, what techniques did you learn for this situation? The grader has an expectation. What is it? – Dave Jul 17 '21 at 15:14
  • Well the focus of my project is to compare different learning algorithms to see which one performs best for the given problem. I'm stuck at this stage because I really want to understand what metrics I should be focusing on before training more models. I've seen some examples like this one which looked at precision and recall https://keras.io/examples/vision/xray_classification_with_tpus/#correct-for-data-imbalance. – ParkTheMonkey Jul 17 '21 at 15:43
  • Perfect! You can compare the models on the proper scoring rules that are discussed in my links, discussing the shortcomings of threshold-based metrics. Further, you can use AUC to get some absolute sense of how you’re doing. (There are issues with that, but it can be comforting to get an AUC in the 90s, since a 90 is an A in school and you want an A in your class.) – Dave Jul 17 '21 at 15:58
  • Thank you so much, especially for the article. Everything's a lot clearer now! – ParkTheMonkey Jul 17 '21 at 16:31

0 Answers0