R mlr - How does tuneThreshold work?

Question

I would like to tune the threshold for the following classification task using tuneThreshold in conjunction with a learner parameter.

I first tried to tune the threshold during the tuning of the learner by setting makeTuneControlRandom(..., tune.threshold = TRUE):

library(ElemStatLearn)
library(mlr)
data(spam)
load_all()

task = makeClassifTask(data = spam, target = "spam")
lrn1 = makeLearner("classif.gbm", predict.type = "prob")
ps = makeParamSet(
  makeIntegerParam("interaction.depth", lower = 1, upper = 5)
)
ctrl = makeTuneControlRandom(maxit = 2, tune.threshold = TRUE)
lrn2 = makeTuneWrapper(lrn1, par.set = ps, control = ctrl, resampling = cv2)
r = resample(lrn2, task, cv3, extract = getTuneResult)
print(r$extract)

[[1]]
Tune result:
Op. pars: interaction.depth=4
Threshold: 0.52
mmce.test.mean=0.0586857

[[2]]
Tune result:
Op. pars: interaction.depth=5
Threshold: 0.54
mmce.test.mean=0.0557573

[[3]]
Tune result:
Op. pars: interaction.depth=5
Threshold: 0.51
mmce.test.mean=0.0514993

Here the optimal threshold is 0.51.

I then tried tuning the threshold by using tuneThreshold directly on the prediction object:

tuneThreshold(r$pred)
$th
[1] 0.5650756

$perf
      mmce 
0.05303195

Here the optimal threshold is 0.565. I don't understand why the optimal threshold here is different from the one above, why didn't it return the same threshold as the one found above, i.e. 0.51? It seems to be adding another layer of randomness but I don't know where or how because when I call tuneThreshold(r$pred) again, the threshold and performance score do not change. How does tuneThreshold work exactly? What does it do with the prediction object r$pred?

score 2 · Accepted Answer · answered Jan 25 '20 at 22:07

tuneThreshold() simply determines the optimal threshold for predicting one class vs another given a set of predictions. That is, the only thing that is changed is the probability threshold that separates the classes; the predictions of the probability for each example are the same.

In your first example, you're also tuning a hyperparameter of the learner. This will result in different predictions for each given data point, depending on the setting of that hyperparameter. The threshold is then tuned as outlined above for a set of given predictions. There's no single threshold that is best for all hyperparameter settings and the predictions they produce, so you're getting different thresholds for different hyperparameter values (in your output you actually have three different interaction depth-threshold combinations, one for each fold).

Now all of this also depends on the data. As you can see in your code, you're ending up with the same hyperparameter setting, but different thresholds for different folds in your CV -- that's because the data points used in those evaluations are different and hence yield different results.

So how do you determine the optimal values to use? You probably want to use the full data set to determine that (e.g. for the threshold use the predictions on the entire data), using a model that you determined to have good performance with a CV -- the point of resampling in general is to give you an estimate of how good the particular approach would perform on new data to enable you to choose the best approach.

Thank you very much for replying so quickly Lars. I understand that each learner configuration and CV fold may yield a different threshold and that resampling is only used to provide an unbiased estimate of the strategy's performance (so that if I'm happy with the performance, I then use `train(learner, task)` to train the learner on the full dataset) but what does `tuneThreshold` do when applied directly to `r$pred` directly via `tuneThreshold`? I've seen it being used in this way but it doesn't make sense to me. Also, where does this `r[["pred"]][["threshold"]]` come from? — user51462, Jan 26 '20 at 00:24
The predictions are the probabilities. The tuned threshold is the probability threshold for one class vs the other. Does that make sense? — Lars Kotthoff, Jan 26 '20 at 19:35
Yes. My confusion is as to why `tuneThreshold(r$pred)` returns a single threshold when the CV results in 3 models (each with their own set of predictions)? Also, as a separate question, when I inspect the resampling object, I see an element of class `ResamplingPrediction`. In that element there is a slot for a single threshold (see `r[["pred"]][["threshold"]]`). This threshold is different from what `tuneThreshold(r$pred)` returns and I was wondering how it was calculated? — user51462, Jan 26 '20 at 22:10
The result for `tuneThreshold` is across all predictions. How the threshold in the resampling prediction is computed depends on what you're doing with it. — Lars Kotthoff, Jan 26 '20 at 22:27

R mlr - How does tuneThreshold work?

1 Answers1