I want to understand how I can evaluate a (multiclass) classifier's performance when this classifier outputs not the predicted label, but a score.
Imagine a classifier that tries to predict the house value based on some features, it outputs a score [0,1] where 0 means it predicts 'low' value and 1 means 'high' value.
I also have some test set, where each example contains one of the three class labels: ['affordable', 'expensive', 'very expensive'] so after running the test set through the classifier, I will have output like this:
example label score
A 'affordable' 0.23
B 'very expensive' 0.56
C 'affordable' 0.54
D 'expensive' 0.80
...
You can see that this model makes a mistake in D, because it gives a higher score (0.8) than B (0.56) but the true label indicates D should have a smaller value than B.
my question is, how do I assess the performance of this classifier?
I understand that if I somehow am able to map the scores to the categories (finding the decision boundary), then I can perform the usual confusion matrix/ ROC analysis. But how can I map the scores?
Thanks in advance!
PS you don't know the implementation detail of said classifier/regressor.