2

Assuming you have a binary classification model $M$ i.e. that for an input $x$ it outputs a number $M(x)=\hat{y}$ where $\hat{y}\in[0,1]$ predicting the binary label of $y\in\{0,1\}$.

For example, a model that receives an image $x$ and outputs whether $x$ has a cat in it or not.

If such a model $M$ has high AUC-ROC (0.9+) for a large test dataset of $X$ and $Y$, does that mean that $\hat{y}$ is, in some way, the probability for $y=1$, (or $P(y=1)=\hat{y}$)?

Are there any resources (articles, books, etc.) regarding the relationaship between $P(y=1)$ and $\hat{y}$?

This question touchs the basics of classification models, and yet I couldn't find any good resources about the subject.

David Taub
  • 148
  • 6
  • 3
    The boundary here is between probabilistic modeling and decision theory. It's common for ML folks to decide that `y=1` when `P(y=1) > threshold`, though on this site we seem to agree that thresholding is bad for probabilistic modeling. – Arya McCarthy Nov 01 '21 at 18:30
  • 3
    This gets at model calibration, but even if the AUC is low, it is reasonable to say that $\hat y$ is your best estimate of $P(y = 1\vert\text{data})$. – Dave Nov 01 '21 at 18:32
  • 2
    You can use math formatting via MathJax. More information: https://math.meta.stackexchange.com/questions/5020/mathjax-basic-tutorial-and-quick-reference – Sycorax Nov 01 '21 at 18:55
  • 1
    I am not quite sure what are your assumptions, but you could have same AUC with monotonic increasing transformation of your probability estimate ( eg sqrt) – seanv507 Nov 01 '21 at 21:54
  • @Sycorax, thanks, I didn't see the $$ trick anywhere in the tips above the question text box... it is very useful! – David Taub Nov 02 '21 at 06:56

1 Answers1

2

YES AND NO

FIRST THE YES

In theory, this is true. Start with logistic regression. That explicitly models the log-odds, which you can convert to the probability. A neural network with a sigmoid activation function on the final node is behaving the same as the inverse link function in a logistic regression. You're trying to get the probability.

NOW THE NO

Many machine learning models have poor calibration. The sklearn documentation has some nice discussion of this. I also have an open question about machine learning (particularly neural network) overconfidence. If your model has poor calibration, then it isn't really reasonable to claim that $\hat y_i= p$ means that $P(Y=1) = p$, since the model is, in some sense, not telling the truth.

Dave
  • 28,473
  • 4
  • 52
  • 104