Predict only when absolutely certain of correctness

Question

I have an issue which I would like to seek some feedback on:

In conventional machine learning we attempt to maximise the accuracy and other metrics of our models, and are generally satisfied with over 80% accuracy.

Problem

However, what if the requirement is instead:

For any new input, if you're absolutely certain of predicting it correctly, then predict. Else, fallback to conventional systems

Intuitively, one would use the probability scores as a way of 'confidence' of the prediction. But what if the probability scores are wrong? eg, predicted a 99% probability of label A, but real answer is label B.

Is there a way to tackle such a problem? I suspect there is something wrong with the data, but we'll never have perfect data, so is there a way to solve this with the existing data?

Attempts

I have tried calibrating the probabilities, but they do not seem to help much. I obtain the following calibration reliability curve:

I have also tried ensembling, but again, it doesn't solve the problem at hand as all the classifiers are certain when predicting the wrong answer.

This has really stumped me the past few days. Would appreciate your insights on this :)

"if you're absolutely certain of predicting it correctly" Can you elaborate on how would you be absolutely certain of predicting something correctly? — Ami Tavory, Jul 12 '17 at 08:06
@AmiTavory Thanks for your reply, that's my question isn't it? I only want the classifiers to predict if they are absolutely certain of predicting it correctly — Wboy, Jul 12 '17 at 08:09
I'm probably missing something, but this sounds impossible, or at least undefined. Either your classifier is based on past patterns, in which case there's always the chance a future pattern (or noise) will be different. Alternatively, there's some deterministic rule you're allowed to use, in which case, you should find how to formulate it. — Ami Tavory, Jul 12 '17 at 08:13
There seems to be some very strong predictors in your models, with high spikes at near 0 and 100% predicted probability. There might even be 'complete seperation' due to specific predictor variables (see https://stats.stackexchange.com/questions/11109/how-to-deal-with-perfect-separation-in-logistic-regression). In other words, you might need to rethink your models, instead of some kind of recalibration based on 'absolute certainty of correctness'. Further, without using the (unknown in new cases) observed outcome I don't see a way absolute certainty would be achievable. — IWS, Jul 12 '17 at 08:20
Hey @IWS, thanks for your reply! Could you explain more about 'rethinking your models'? What should I be doing? — Wboy, Jul 12 '17 at 09:10
If you look at the question in the link I provided, you can see a couple of options for dealing with predictors which cause complete separation. First thing however, would be to check whether this is actually the case (i.e. by plotting the possible predictor values against the predicted probabilities, for each predictor in the models). Separation would look like a sudden 'jump' in predicted probability. If separation did occur, best thing to do is look at @Scortchi's answer. It provides more answers than I ever could. (ps. if this is indeed the problem, do add the plots to the question) — IWS, Jul 12 '17 at 09:19

Predict only when absolutely certain of correctness

0 Answers0