Why use mutual information if it is just a function of correlation?

Question

(This question differs from a similarly titled one because mine focuses instead on the analytical solution of mutual information as a function of correlation, its usage, and its seeming pointlessness as a replacement to correlation.)

The closed-form analytical solution for mutual information (a scalar) between two jointly Gaussian distributed random variables $X$ and $Y$ is $$I(X,Y) =-\frac{1}{2}\ln(|\rho|)$$ where $\rho$ is the joint correlation matrix between $X$ and $Y$.

If $I(X,Y)$ can pick up non-linear co-dependencies that correlation can't, why do I not see this indicated anywhere in the above formula?
If $I(X,Y)$ is just a re-expression of correlation as shown above, what's the point of even switching to mutual information from correlation?
Is $I(X,Y)$'s advantage of picking up full co-dependencies that correlation can't only apparent for joint non-Gaussian variables, where the above analytical solution does not apply?

Isn't that for when the bivariate distribution is *jointly* Gaussian? Two Gaussian distributions can be uncorrelated yet most definitely not independent. — Dave, Oct 28 '20 at 22:12

score 5 · Answer 1 · answered Oct 28 '20 at 22:16

5

When a bivariate distribution is jointly Gaussian, it means that the dependence structure is correlation.

Therefore, mutual information and correlation magnitude between the marginals become synonyms.

Note, however, that the mutual information does not give you the direction of correlation (nor should it).

answered Oct 28 '20 at 22:16

Dave

28,473
4
52
104

and non-Gaussian joints? what about them – develarist Oct 29 '20 at 00:46
2

Then your $\frac{1}{2}\log(\vert\rho\vert)$ formula does not apply. – Dave Oct 29 '20 at 01:15
when the analytical solution no longer applies, then does the non-analytical stand-alone formula for mutual information suddenly have an upper hand over correlation? i.e. is it no longer a function of correlation whatsoever? – develarist Oct 29 '20 at 01:18
3

When the bivariate distribution is not jointly Gaussian, then correlation doesn’t tell the whole story. You might be interested in this answer: https://stats.stackexchange.com/a/30205/247274. There’s a relationship between mutual information and copula. – Dave Oct 29 '20 at 01:25
Even if there is a relationship, what good is mutual information if it is just merely equal to the entropy of a bivariate copula? why should the measurement of a joint distribution's dependence structure using copula be overhauled by measuring that same dependence structure's entropy? what is the added value – develarist Oct 29 '20 at 01:32
The mutual information summarizes the copula in a single number, which is easier to manage and perhaps easier to estimate. – Dave Oct 29 '20 at 01:35
But Spearman's rank correlation and Kendall's tau do too, so what? And actually mutual information is the most difficult to estimate of all of them – develarist Oct 29 '20 at 04:17
1

When the bivariate distribution is bivariate Gaussian, Pearson correlation is the parameter of the Gaussian copula. Pearson correlation captures all of the dependence structure, so all of the mutual information. Mutual information would be more useful when there is a funkier dependence structure, say an X shape instead of a slash or backslash. – Dave Oct 29 '20 at 10:01
When you say mutual information and correlation are synonyms for a Gaussian joint distribution, you're not saying that they are *equivalent* are you? since $I(X,Y) =-\frac{1}{2}\ln(|\rho|)$, which is not equivalent – develarist Oct 30 '20 at 11:34
No, they’re not equal, but if you know one, you know the (magnitude of) the other. – Dave Oct 30 '20 at 11:42

Why use mutual information if it is just a function of correlation?

1 Answers1