I'm curious to see if there are any useful metrics to evaluate classification models using numeric probabilities.
Traditionally, I would train a classification model, generate factor predictions on the test set, and use a confusion matrix or ROC curve to decide on the best models. However, in this instance, I'm interested in doing model evaluation from looking at the numeric probablities.
Update
An example of what I'm talking about is this: I fit a multiple models and have it predict classes on the test set. Usually I can create a confusion matrix
Model 1
Yes No
Yes 10 5
No 2 13
Model 2
Yes No
Yes 3 11
No 8 8
From the confusion matrix, I can clearly tell that model 1 is more accurate than model 2.
How would I evaluate two models if I have them give me numeric probablities instead, for instance:
Model-1 Preds Model-2 Preds Test Set
.59 .25 No
.14 .08 No
. . .
. . .
. . .
.33 .29 Yes
I have thoughts of discretizing them or converting the yes and no's into 1 and 0's and calculating the residuals. Just wanted to know if there are more formal best practices to use in this case.