What is calibration of a probability model? A take using Bayes’ rule

Question

As a discussion from last year about spam/ham email classification shows, just because a model gets perfect classification accuracy does not mean that it really knows what it's doing. In that example, the emails with $P(\text{spam}) < 0.49$ are always ham emails. That is ridiculous. If there is a $49\%$ chance of an email being spam, sure, it is more likely to be ham than spam, but it should not be so surprising to see some of those wind up being spam messages. In fact, that should happen almost half of the time.

Phrased in terms of baseball, a $0.300$ hitter probably won't get a hit, but he does get a hit $30\%$ of the time. If your claimed $0.300$ hitter keeps not getting hits, perhaps he is not a $0.300$ hitter.

I have a model that I know has good calibration, since I generated the data in a simulation and verified with rms::calibrate that the predicted probabilities almost perfectly match the true probabilities.

However, when I try to do it myself, I fail. I cannot show that the correct proportion of $1$s are below the various thresholds: $30\%$ should be below a cutoff of $0.3$, $80\%$ should be below a cutoff of $0.8$, etc.

I have reasoned through the problem using Bayes' rule; $c$ is the cutoff.

$$ P\big(y = 1 \vert \hat y > c\big) = \dfrac {P\big(\hat y > c \vert y = 1\big)P\big(y = 1\big)} {P\big(\hat y > c\big)} $$

I figure that this, as a function of $c$, should equal the cutoff $c$.

Where have I gone awry?

(Perhaps my logic is sound but I just made a coding error. A Dave can dream, right? (But let's focus on the logic and let me take another shot at the code once I figure out the math.))

Why use Bayes rule? You could instead construct a LOWESS smoother using predicted probability on the x and observed outcome on the y (as the RMS package does). Alternatively, you can split the predicted probabilities into deciles and compute the proportion of events which occur in each decile (this is kind of like the Hosmer-Lemeshow test) — Demetri Pananos, Aug 13 '21 at 13:46
I don't understand why you think the given probability should equal $c$. If $\hat{y}$ is calibrated and greater than $c$, the probability that $y = 1$ should also be greater than $c$. — Accidental Statistician, Aug 13 '21 at 13:49
@DemetriPananos I am not sure that Bayes is the right way to go, but something feels right about saying that about $30\%$ of the cases with $P(1) = 0.3$ are category $1$. What I think I want to do might be more along the lines of a continuous version of what you mention about deciles (or at least finer resolution than deciles). Thinking another minute, that really does sound like what I want to do, a continuous version of that decile method. // Does the RMS book get into how the `calibrate` function works? I don't have it nearby, and I have not read it cover to cover (which I am correcting). — Dave, Aug 13 '21 at 13:51
@Dave You might find this useful https://darrendahly.github.io/post/homr/ — Demetri Pananos, Aug 13 '21 at 13:53

What is calibration of a probability model? A take using Bayes’ rule

0 Answers0