After reading sklearn manual it was not very obvious for me to understand how Isotonic regression works in the case of probability calibration (using CalibratedClassifierCV).
I briefly read sklearn's github code for this model and some additional theory on the subject (this is a good source) and concluded that in this case we have the following algorithm.
(Below for $i=1,\ldots,N$: $y_i$'s are the true labels, $\hat p_i = \hat P(Y=1|x_i)$'s are the corresponding estimates of probabilities, given by some base classifier, and $w_i$'s are some positive sample weights; for simplicity let's consider binary classification.)
- Sort pairs $(\hat p_k, y_k)_{k=1}^N$ from calibration set by first argument, then by second argument (i.e. using $\text{np.lexsort}((y,\hat p))$ ) and get sorted pairs $(\hat p_i, y_i)_{i=1}^N$.
- Solve the following constrained optimization problem (i.e. fit Isotonic regression model):
$\displaystyle \text{minimize}~ \sum_{i=1}^N w_i (q_i - y_i)^2$ with respect to $\mathbf{q} \in \mathbb{R}^N$
$\text{subject to} ~~0 \le q_1 \le q_2 \le \ldots \le q_N \le 1$.
In sklearn this optimization problem is solved by using Pool Adjacent Violators Algorithm (PAVA), which is a linear time (and linear memory) $O(N)$ algorithm for linear ordering isotonic regression. - Define the "prediction function" $q(\hat p (x))$ as a linear interpolation on a set of points $(\hat p_1, q_1), \ldots, (\hat p_N, q_N)$. So, for every new point $\hat p_k \in [0,1]$ the prediction is $q(\hat p_k) \in [0,1]$.
So the only difference from the common usage of the Isotonic regression is that for the calibration task we use set of probabilities $\{\hat p_1, \ldots, \hat p_N\}$ (given by some base classifier) instead of a set of train samples $\{x_1, \ldots, x_N\}$ and have requirement $0 \le q \le 1$ (in common case $q \in \mathbb{R}$).
Also, when base classifier has decision function, we will use its values (such as $\{w^T x_1, \ldots, w^T x_N\}$ for linear base classifier) instead of $\{\hat p_1, \ldots, \hat p_N\}$.
I want to know, if my understanding is correct? Please correct me if I have mistakes. Thanks!