I'm looking for alternatives to RPS to evaluate if my model is "good", as I can't use the RPS (because I do not have the actual probabilities).
My problem is a multiclass classification (30 categories) that something will happen in 1 day, in 2 days, in 3 days, ..., in 30 days. I'm treating this as a classification problem because I need to predict probabilites (and many of the estimators from sklearn
has predict_prob
method.
As I do not have the actual probabilities, I'm trying to find a measure that can be calculated using the actual class (instead of actual probability).
I was thinking of Brier Score (for multiclass) due this question: How to Compute the Brier Score for more than Two Classes
Is Brier Score good for evaluate my models and then choose the "best model" from Bier Score and suppose that this model would generate the lower RPS? Or should I use another metric?
Obs.: the classes are imbalanced (the class 30 has approximately 40% of frequency against others classes).