(This question differs from a similarly titled one because mine focuses instead on the analytical solution of mutual information as a function of correlation, its usage, and its seeming pointlessness as a replacement to correlation.)
The closed-form analytical solution for mutual information (a scalar) between two jointly Gaussian distributed random variables $X$ and $Y$ is $$I(X,Y) =-\frac{1}{2}\ln(|\rho|)$$ where $\rho$ is the joint correlation matrix between $X$ and $Y$.
- If $I(X,Y)$ can pick up non-linear co-dependencies that correlation can't, why do I not see this indicated anywhere in the above formula?
- If $I(X,Y)$ is just a re-expression of correlation as shown above, what's the point of even switching to mutual information from correlation?
- Is $I(X,Y)$'s advantage of picking up full co-dependencies that correlation can't only apparent for joint non-Gaussian variables, where the above analytical solution does not apply?