Using the caret package is it possible to obtain confusion matrices for specific threshold values?

Question

I've obtained a logistic regression model (via train) for a binary response, and I've obtained the logistic confusion matrix via confusionMatrix in caret. It gives me the logistic model confusion matrix, though I'm not sure what threshold is being used to obtain it. How do I obtain the confusion matrix for specific threshold values using confusionMatrix in caret?

I don't have an answer, but often questions like this are answered in the help file. If that fails, you can look at the source code itself. You can print the source to the console by typing `confusionmatrix`, without parentheses. — shadowtalker, Aug 07 '14 at 01:36
It isn't quite clear what you have done exactly. Did you call the `glm` function from the `stats` package and pass its result to `confusionMatrix`? I didn't know one could do that, and reading the manual it isn't clear one can at all. Or did you `predict` something? A short example would help. — Calimo, Aug 07 '14 at 06:33
@Calimo I've used the `train` function in `caret` to fit the model, which lets me specify it as a glm with binomial family. I then used the `predict` function on the object generated via `train`. — Black Milk, Aug 07 '14 at 20:14

efh0888 · Answer 1 · 2014-10-11T13:47:30.870

16

There is a pretty easy way, assuming tune <- train(...):

probsTest <- predict(tune, test, type = "prob")
threshold <- 0.5
pred      <- factor( ifelse(probsTest[, "yes"] > threshold, "yes", "no") )
pred      <- relevel(pred, "yes")   # you may or may not need this; I did
confusionMatrix(pred, test$response)

Obviously, you can set threshold to whatever you want to try or pick the "best" one, where best means highest combined specificity and sensitivity:

library(pROC)
probsTrain <- predict(tune, train, type = "prob")
rocCurve   <- roc(response = train$response,
                      predictor = probsTrain[, "yes"],
                      levels = rev(levels(train$response)))
plot(rocCurve, print.thres = "best")

After looking at the example Max posted, I'm not sure if there are some statistical nuances making my approach less desired.

edited Oct 11 '14 at 13:47

answered Oct 09 '14 at 05:16

efh0888

261
2
3

In the outputted rocCurve plot, what do the three values mean? e.g. on my data it says 0.289 (0.853, 0.831). Does the 0.289 signify the best threshold that one should use in demarcating the binary outcome? i.e. every case with a predicted probability > 0.289 would be coded "1" and every case with a predicted probability < 0.289 would be coded "0", rather than the 0.5 default threshold of the `caret` package? – coip Feb 15 '18 at 23:14
2

yep that's exactly right, and the other 2 values in parentheses are sensitivity and specificity (honestly, though, I forget which is which) – efh0888 Jul 31 '18 at 13:41
2

also, since then I figured out you can extract it from the roc curve using `rocCurve$thresholds[which(rocCurve$sensitivities + rocCurve$specificities == max(rocCurve$sensitivities + rocCurve$specificities))]` which also gives you the flexibility to weight them differently if you want... one last thing to note is that realistically, you probably want to tune the threshold (like you would with any model hyperparameter) as Max describes [here](http://appliedpredictivemodeling.com/blog/2014/2/1/lw6har9oewknvus176q4o41alqw2ow). – efh0888 Jul 31 '18 at 13:47

score 13 · Accepted Answer · edited May 11 '19 at 11:03

Most classification models in R produce both a class prediction and the probabilities for each class. For binary data, in almost every case, the class prediction is based on a 50% probability cutoff.

glm is the same. With caret, using predict(object, newdata) gives you the predicted class and predict(object, new data, type = "prob") will give you class-specific probabilities (when object is generated by train).

You can do things differently by defining your own model and applying whatever cutoff that you want. The caret website also has an example that uses resampling to optimize the probability cutoff.

tl;dr

confusionMatrix uses the predicted classes and thus a 50% probability cutoff

Max

Using the caret package is it possible to obtain confusion matrices for specific threshold values?

2 Answers2