Differences between probabilistic regression + threshold and classification?

Question

Working on probabilistic models, we often end up thresholding the result to decide if we should take some action or not. This method allow simple and explicit decisions, while being adaptative to our means.

It appears to be quite equivalent to the use of a classification technique while changing the Type I / II error costs. However this second method appears to be less transparent (absence of individual probability, absence of simple interpretation of the output) and less adaptative to our means.

What are the technical differences that we may have missed between thresholding a probabilistic model and changing the the cost of errors in a classification model ?

score 1 · Answer 1 · answered Jun 25 '19 at 15:45

The critical issue is how the model is fit.

In practice, classification models are often fit based on criteria like accuracy, sensitivity or specificity. These are improper scoring rules for which optimization need not occur at the true set of probabilities. An improper scoring rule can lead to silly results.

It's best to use a proper scoring rule like the logarithmic rule to fit a probability model and then choose your decision threshold based on costs. This nicely separates out the statistical issue from the decision component, which you have already identified as an advantage of that approach. Furthermore, if your cost estimates change later then you don't have to refit the entire model, just adjust the decision threshold accordingly.

Differences between probabilistic regression + threshold and classification?

1 Answers1