Quality metric for classifier with decision rule allowing “none of the above”

Question

Let's suppose that I have classification model for n classes ($n>1$). The classifier returns a probability distribution over a set of classes. But if classifier is not sure (i.e. there is no probability greater than given threshold) I would like the model to return answer "none of the above". In other words it is classifier with the following decision rule:

$class(x) = \left\{\begin{matrix} \underset{i}{arg\, max} \,p_i & {if } & \exists_i\, p_i \geq \mathfrak{p} \\ \text{none-of-the-above} & {if } & \forall_i\, p_i < \mathfrak{p} \end{matrix}\right.$

where $\mathfrak{p}$ is a probability threshold and $p_i$ is a probability that object $x$ belongs to class $i$.

And the question is: how to measure model accuracy?

One idea could be a calculation of any standard metric (for example $F_1$ score) and then multiplying it by percent of predicted classes:

$quality = F_1 \cdot {{\text{size of test set}\,-\,\text{number of "none-of-the-above" cases}}\over{\text{size of test set}}}$

Is that good idea? Or there are other approaches?

score 2 · Answer 1 · answered Oct 27 '18 at 06:59

Do not use classification thresholds unless you understand the trade-offs of your decision. In the present context, don't output "class A", "class B", or "I'm not sure". Output probabilities: "A 98%, B 2%", "A10%, B 90%", "A 49%, B 51%", regardless of any threshold. Then evaluate these probabilistic predictions using proper scoring-rules. This obviates your problem completely.

More information and background here: Why is accuracy not the best measure for assessing classification models? and Is accuracy an improper scoring rule in a binary classification setting?

Quality metric for classifier with decision rule allowing “none of the above”

1 Answers1