1

I have an AI algorithm for medical purposes (diagnosis). I want to test how well it performs in practice. The problem is that the algorithm does not provide a yes/no answer but produces a score 0-1. How would you test such an algorithm in practice?

Cesare
  • 575
  • 3
  • 13

1 Answers1

4

If your score is a predicted probability for whether some target condition is present or not, you can assess the quality of this prediction using proper scoring rules. My answer here may be helpful. Common scoring rules are the Brier and the log score. You can find arguments for and against both here.

If your score is not a predicted probability but just some number that correlates with such a probability, you can run a logistic regression of the target outcome against the score. The predictions from this model, with the score as input, are probabilistic predictions, and you can feed these into your proper scoring rules.

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357
  • Thanks a lot for your answer. Before I dive deeper into the topic. Would the methods that you suggest allows me to validate the performance of an algorithm that output a p value using as gold standard the yes/no diagnosis of an expert, or does the expert output also have to be 0-1? – Cesare Nov 23 '20 at 08:17
  • Hm. Do you consider the yes/no diagnosis from the expert the *actual truth*, or as an *alternative prediction* that your scores compete against (in which case the ground truth would need to come from somewhere else)? – Stephan Kolassa Nov 23 '20 at 08:24
  • The expert yes/no would be the actual truth – Cesare Nov 23 '20 at 08:32
  • 2
    Very good. Yes, that is exactly where you can use proper scoring rules, which compare probabilistic predictions with the actual truth (which here would be the expert's diagnosis). You may want to look at our [tag:scoring-rules] tag. – Stephan Kolassa Nov 23 '20 at 09:05