The conditions of a classification problem are as follows:
There are $K$ classes we are predicting with the $p$ dimensional predictor vector $X$. Let $Y$ be the $K-1$-dimensional one-hot encoded vector, with the $K^{th}$ class is encoded as $Y=(0, \ldots, 0)$.
If $S_{11}, S_{22}, S_{12}$ denote the sample covariance matrices (scaled by $N$ ) for $X, Y$, and $(X, Y)$, respectively, and if $S_{B}$ is the between-class covariance matrix, I want to show that $$ S_{B}=N S_{12} S_{22}^{-1} S_{21}. $$
Direct computations are welcome, but I would also like some intuition about what the between-class covariance matrix is.