Assuming you have a binary classification model $M$ i.e. that for an input $x$ it outputs a number $M(x)=\hat{y}$ where $\hat{y}\in[0,1]$ predicting the binary label of $y\in\{0,1\}$.
For example, a model that receives an image $x$ and outputs whether $x$ has a cat in it or not.
If such a model $M$ has high AUC-ROC (0.9+) for a large test dataset of $X$ and $Y$, does that mean that $\hat{y}$ is, in some way, the probability for $y=1$, (or $P(y=1)=\hat{y}$)?
Are there any resources (articles, books, etc.) regarding the relationaship between $P(y=1)$ and $\hat{y}$?
This question touchs the basics of classification models, and yet I couldn't find any good resources about the subject.