I'm trying to understand the adjusted Rand index for a metric review that I'm doing. I found this question most helpful so far:
I think I have a fairly good grasp of the Rand index, however the idea of the "expected" Rand index model is difficult to understand for me. The problem is that the whole definition tries to use combinations, while for me "permutations" would be much more intuitive (somewhat supported by the name, *permutation * model).
Using the Wikipedia model my main goal is to clarify how this: $$ARI = \frac{ \sum_{ij} \binom{n_{ij}}{2} - [\sum_i \binom{a_i}{2} \sum_j \binom{b_j}{2}] / \binom{n}{2} }{ \frac{1}{2} [\sum_i \binom{a_i}{2} + \sum_j \binom{b_j}{2}] - [\sum_i \binom{a_i}{2} \sum_j \binom{b_j}{2}] / \binom{n}{2} },$$
correspond to this
$$ARI = \frac{ RI - \mathbb{E}[RI] }{\max(RI)-\mathbb{E}[RI]}$$
which uses the contingency table definition
\begin{array}{c|cccc|c} {{} \atop X}\!\diagdown\!^Y & Y_1& Y_2& \cdots& Y_s& \text{sums} \\ \hline X_1& n_{11}& n_{12}& \cdots& n_{1s}& a_1 \\ X_2& n_{21}& n_{22}& \cdots& n_{2s}& a_2 \\ \vdots& \vdots& \vdots& \ddots& \vdots& \vdots \\ X_r& n_{r1}& n_{r2}& \cdots& n_{rs}& a_r \\ \hline \text{sums}& b_1& b_2& \cdots& b_s& \end{array}
Attempt 1 If we operate on the assumption that $s = k$ and the correct clustering would result in a diagonal contingency table, then I can see that the true positives can be expressed as.
$$ TP = \binom{X_1 \cap Y_1}{2} + ... = \sum_{i,j=1} \binom{X_{i} \cap Y_{i}}{2} = \sum_{i,j=1} \binom{n_{ij}}{2} $$
Based on this interpretation the row sums from the diagonal will be FP, the columns from the diagonal will be FN. Leading to the formula for the RI:
$$RI = \frac{2 \cdot \sum \binom{n_{ij}}{2} - \sum \binom{a_{i}}{2} - \sum \binom{b_{j}}{2} + \binom{N}{2}}{\binom{N}{2}} $$