I've been reading over this Multivariate Gaussian conditional proof, trying to make sense of how the mean and variance of a gaussian conditional was derived. I've come to accept that unless I allocate a dozen or so hours to refreshing my linear algebra knowledge, it's out of my reach for the time being.
that being said, I'm looking for a conceptual explanation for that these equations represent:
$$\mu_{1|2} = \mu_1 + \Sigma_{1,2} * \Sigma^{-1}_{2,2}(x_2 - \mu_2)$$
I read the first as "Take $\mu1$ and augment it by some factor, which is the covariance scaled by the precision (measure of how closely $X_2$ is clustered about $\mu_2$, maybe?) and projected onto the distance of the specific $x_2$ from $mu_2$."
$$\Sigma_{1|2} = \Sigma_{1,1} - \Sigma_{1,2} * \Sigma^{-1}_{2,2} * \Sigma_{1,2}$$
I read the second as, "take the variance about $\mu_1$ and subtract some factor, which is covariance squared scaled by the precision about $x_2$."
In either case, the precision $\Sigma^{-1}_{2,2}$ seems to be playing a really important role.
A few questions:
- Am I right to treat precision as a measure of how closely observations are clustered about the expectation?
- Why is the covariance squared in the latter equation? (Is there a geometric interpretation?) So far, I've been treating $\Sigma_{1,2} * \Sigma^{-1}_{2,2}$ as a ratio, (a/b), and so this ratio acts to scale the (second) $\Sigma_{1,2}$, essentially accounting for/damping the effect of the covariance; I don't know if this is valid.
- Anything else you'd like to add/clarify?