I would like to tune the threshold for the following classification task using tuneThreshold
in conjunction with a learner parameter.
I first tried to tune the threshold during the tuning of the learner by setting makeTuneControlRandom(..., tune.threshold = TRUE)
:
library(ElemStatLearn)
library(mlr)
data(spam)
load_all()
task = makeClassifTask(data = spam, target = "spam")
lrn1 = makeLearner("classif.gbm", predict.type = "prob")
ps = makeParamSet(
makeIntegerParam("interaction.depth", lower = 1, upper = 5)
)
ctrl = makeTuneControlRandom(maxit = 2, tune.threshold = TRUE)
lrn2 = makeTuneWrapper(lrn1, par.set = ps, control = ctrl, resampling = cv2)
r = resample(lrn2, task, cv3, extract = getTuneResult)
print(r$extract)
[[1]]
Tune result:
Op. pars: interaction.depth=4
Threshold: 0.52
mmce.test.mean=0.0586857
[[2]]
Tune result:
Op. pars: interaction.depth=5
Threshold: 0.54
mmce.test.mean=0.0557573
[[3]]
Tune result:
Op. pars: interaction.depth=5
Threshold: 0.51
mmce.test.mean=0.0514993
Here the optimal threshold is 0.51.
I then tried tuning the threshold by using tuneThreshold
directly on the prediction object:
tuneThreshold(r$pred)
$th
[1] 0.5650756
$perf
mmce
0.05303195
Here the optimal threshold is 0.565. I don't understand why the optimal threshold here is different from the one above, why didn't it return the same threshold as the one found above, i.e. 0.51? It seems to be adding another layer of randomness but I don't know where or how because when I call tuneThreshold(r$pred)
again, the threshold and performance score do not change. How does tuneThreshold
work exactly? What does it do with the prediction object r$pred
?