1

Say I have a set of 100 positive and 1,000 unlabeled points. Among those unlabeled, some are presumably similar enough to the positives that they should belong to the positive class. Yet, if I classify every unlabeled point as negative I will get 100% accuracy, 100% precision and 100% recall. That is, if I set unlabeled point classified as negatives as TN as suggested here: Metrics for one-class classification

It follows that this way of calculating the metrics favor an extremely over fitted model. A k-NN with k = 1 would yield the highest score.

What are some better metrics that will favor a model that is better “in reality”? Should I perhaps just use TP?

user438236
  • 11
  • 1

0 Answers0