1

I have long been struggling with setting a valid threshold t for predicting my binary logistic model and hereafter evaluate how well it performs (see code below). I believed setting a threshold for binary prediction was more subjective than statistical. After reading both Stephan Kolassa's and Tamas Ferenci's thoughts here and here, I have been confirmed that setting a threshold is more towards a decision theoretic aspect than statistically. However, I have no prior knowledge in that field.

So, assume I have to predict the outcome of whether a fire occurs or not. I first run my ElasticNet model on my training data and then evaluate based on my test data. I come to a point where I have to set a threshold for my binary outcome to be either 0 (no fire) or 1 (fire) (notice the data is highly imbalanced, hence, the low threshold, see code). Predicting 0's as 1's and vice versa are not the end of the world in my case, like predicting cancer as no-cancer in the medical world, but it still makes a substantial differences if I choose t = 0.0012 or t = 0.0007.

Note about the data: It consists of 25 variables and 620 000 observations all on a continuous scale except the dependent variable which is factorial. One could use the iris dataset with only two outcomes in dependent variable to simulate my dataset.

set.seed(123)
model <- cv.glmnet(x.train, y.train, type.measure = c("auc"), alpha = i/10, family = "binomial", parallel = TRUE)

predicted <- predict(model, s = "lambda.1se", newx = x.test, type = “response”)
auc <- model$cvm
t <- 0.001
predict_binary <- ifelse(predicted > t, 1, 0)
CM <- confusionMatrix(as.factor(predict_binary), as.factor(y.test))

COEFFICIENTS
(Intercept)    -1.212497e+01
V1             -4.090224e-03
V2             -6.449927e-04
V3             -2.369445e-04
V4              9.629067e-03
V5              4.987248e-02
V6              .           
V7             -1.254231e-02
V8              .           
V9              5.330301e-06
V10             .           
V11             7.795364e-03
V12             .   

Predicted binary outcome Enlarged predicted binary outcome

Dependent on the threshold set by t, I get the following confusion matrices.

t = 0.001                     t = 0.0012                    t = 0.0007
          Reference                     Reference                     Reference
Prediction      0      1      Prediction      0      1      Prediction      0      1
         0 107019     15               0 109857     17               0  99836     11
         1  17039     32               1  14201     30               1  24222     36
  1. How can one justify choosing one threshold value over another?
  2. How can one optimize the prediction of true positive while minimizing the prediction of false positive?
  3. Is there any way in R for choosing a 'best' threshold for binary outcomes?
Thomas
  • 332
  • 1
  • 13
  • 3
    What are your relative costs of false-positive and false-negative errors? – EdM Jul 22 '20 at 23:12
  • @EdM, are you asking about what a false-positive or false-negative prediction would cost relatively? Such as in the cancer example, we are talking about the added health and social cost with a misclassification for instance. If so, then I have no idea. It could be expensive - say one misclassified non-fire evolves into a big fire creeping into a nearby neighborhood. – Thomas Jul 22 '20 at 23:19
  • 3
    That, however, is exactly the decision you make when you choose a probability cutoff whether you are conscious of it or not. – EdM Jul 23 '20 at 00:55

1 Answers1

4

+1 to EdM's comments. If you do not know the costs of inappropriate decisions, then you cannot set an optimal threshold. In particular since you may have more than one possible decision here: if the probability for a fire is low, then do nothing; if it is somewhat higher, send a police squad car to investigate; if it is yet higher, send out the fire brigade; at the top end alert neighboring departments.

So: do not deal with thresholds at all. Use the output of your model as a probabilistic prediction, and assess its quality using proper .

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357
  • Thank you Stephan. I have read, what feels like, all of your answers and comments on this topic on CV, and Frank Harrell's blogs. But sorry, I still do not understand how to go about the next steps of the evaluation of my regression model. Would you mind elaborate on the probabilistic prediction and scoring- rules? I do not understand that part. I have added the output of my given model. What proper scoring-rules should I use? Brier? Thank you again. – Thomas Jul 23 '20 at 07:42
  • 2
    Your `predicted` is a probabilistic prediction, it is between 0 and 1 and can be interpreted as a probability. You can compare it to your ground truth `y.test`, which I will assume to be a vector of `0` and `1`. The [Brier score](https://en.wikipedia.org/wiki/Brier_score) would then simply be `mean((predicted-y.test)^2)`. Alternatively, the [log score](https://en.wikipedia.org/wiki/Scoring_rule#ProperScoringRules) would be `mean(log(c(predicted[y.test==1],1-predicted[y.test==0])))`. Either one is negatively oriented (smaller is better) and can be used to decide between competing models. ... – Stephan Kolassa Jul 23 '20 at 09:16
  • 1
    ... As to which scoring rule to use, I recommend [Merkle & Steyvers (2013, *Decision Science*)](https://pubsonline.informs.org/doi/abs/10.1287/deca.2013.0280). I also very much like Tilmann Gneiting's papers, e.g. [Gneiting & Raftery (2007, *JASA*)](https://www.tandfonline.com/doi/abs/10.1198/016214506000001437) or [Gneiting & Katzfuss (2014, *Ann Rev Stat App*)](https://www.annualreviews.org/doi/10.1146/annurev-statistics-062713-085831). Good luck! – Stephan Kolassa Jul 23 '20 at 09:20
  • Thank you so much. I really value your input and I see how much effort you put in these reply not only mine but the others as well. It all taught me a lot about evaluation of my model, accuracy as an improper scoring rule etc. Thank you very much. – Thomas Jul 23 '20 at 10:07
  • I have some additional questions now, that I have read your comments, posts and articles again. 1) With my imbalanced data (say 99.5 % 0's, 0.5 % 1's), isn't it evident my `Brier score` would be low (~0.0005) a bit like the `accuracy` always being high? 2) Would only the `Brier score` be satisfiying in itself to report about the model's prediction performance? 3) Eventually, how can this proper scoring rule help me in predicting 1's as 1's and 0's as 0's more precisely then? Thank you again! – Thomas Jul 23 '20 at 12:34