To score a RandomForestClassifier
using GridSearchCV
for multiclass classification, I decided to use Brier score.
However, I could only manage to get the Brier score for each class.
Is it reasonable to get the average of that as an overall performance measure? Or can you think of a better way instead?
Edit: I am aware this question is similar, so I'll explain why I think it's a different problem:
When I run my model with brier_score
as defined by that question's author (brier_multi
), the score obtained for the best model is 202.3
However, when I apply the following code (made by me)
def brier_score_multi(y_true, y_pred):
y_true_bin = label_binarize(y_true, classes=[0,1,2])
y_pred_bin = label_binarize(y_pred, classes=[0,1,2])
score = mean([brier_score_loss(y_true_bin[:,0], y_pred_bin[:,0]),brier_score_loss(y_true_bin[:,1], y_pred_bin[:,1]),brier_score_loss(y_true_bin[:,2], y_pred_bin[:,2])])
return score
The best score is 0.0432.
As you can see, this is a big difference, and given the definition of a the brier score, I'm biased towards the second result.
EDIT 2:
Seeing as the first result is incorrect, I started thinking... maybe instead of the average between classes, the sum of the brier score between classes makes more sense?