7

The mutual information seems to be quite an interesting measure of the relationship between variables. As such I wanted to apply it to investigate the relationship of two continuous variables $X$ and $Y$ for which I only have a hundred observations. In particular, I would like to obtain a normed version of the mutual information such that it is $1$ in the case of perfect dependence. I guess this means that the entropy of $X$ and $Y$ also need to be estimated.

After doing some research, I realized that estimating the (unnormalized) mutual information of two continuous variables is highly nontrivial. As a result, multiple competing approaches exists. Khan et al (2007) provides an overview of some of them. This paper also compares multiple approaches under different settings and makes recommendations when to use which approach. However, this paper is already 12 years old and since then new estimators have been developed; for example, Belghazi et al (2018). So, is anybody active in this field and can provide a recommendation which estimator is currently to be preferred (in which situation)? Ideally, I would also like to obtain a confidence interval for the normed mutual information.

Neil Traft
  • 103
  • 4
Julian Karch
  • 1,433
  • 1
  • 13
  • 26

1 Answers1

4

I am not sure i understand why this should be a very hard problem, at least in such a low-dimensional setting as you describe. I am not active in the fields of those who have authored the articles you cite, but I do not see why this could not be framed as a relatively simple statistical problem. An idea you could (perhaps) follow, is to relate it to the copula of $X$ and $Y$. The mutual information of $X$ and $Y$ is the Kullbach-Leibler divergence of their actual joint density $f(x, y),$ and their joint density under the assumption of independence $f^*(x, y) = f(x)f(y).$ If you write $$F(x, y) = C(F(x), G(y)),$$ (where $C$ is called a copula, and this is for a continous bivariate random vector a unique representation) such that $$f(x, y) = c(F(x), G(y))f(x)g(y),$$ where $c(u, v) = \frac{\partial C}{\partial u \partial v}(u, v)$ is the copula density of $X$ and $Y$, then their mutual information can be written as \begin{align*} I(X, Y) &= \underset{\mathbb{R}^2}{\int\int}\log\left(\frac{f(x, y)}{f(x)f(y)}\right)f(x, y)dxdy\\ &=\underset{\mathbb{R}^2}{\int\int}\log\left(c(F(x), G(y)\right)c(F(x), G(y))f(x)f(y)dxdy\\ &=\underset{\mathbb{I}^2}{\int\int}\log\left(c(u, v)\right)c(u, v)dudv\\ &= \mathbb{E}_{C}\left(\log\left(c(U, V)\right)\right). \end{align*}

I would suggest to use the semiparametric approach where you first compute the so called pseudo observations $\hat F(x) = \frac{1}{n-1}\sum_{i=1}^nI(x_i < x),$ $\hat G(y) = \frac{1}{n-1}\sum_{i=1}^nI(y_i < y),$ and then try to find some parametric copula $C_\theta$ that fits well to $(U^*, V^*) = (\hat F(X), \hat G(Y)).$ Then, you can estimate the mutual information by computing the integral above by numerical integration, or Monte Carlo methods, replacing $c$ and $C$ by $c_{\hat\theta}$ and $C_{\hat\theta}.$ If you estimate $I(X, Y)$ by sampling from the estimated copulas, you could get a confidence interval by repeatedly doing this based on, say, $100$ samples from $C_\hat\theta,$ and then using empirical quantiles of these estimates.

I am not sure about a normed mutual information. I do not know if it is possible to compute bounds on the mutual information, and I am not sure how to compute the mutual information between two perfectly dependent random variables, as this would correspond to computing the expectation of the log-copula density of either of $M(u, v) = \min(u, v)$ or $W(u, v) = \max(u + v -1 , 0),$ which do not exist as these are not absolutely continous probability measures.

Simon Boge Brant
  • 595
  • 2
  • 13
  • 1
    Thanks for the elaborate answer!! I guess with highly nontrivial I meant that finding an estimator that is optimal seems to be hard as there are so many alternatives. Finding an estimator seems to be relatively easy, as you have just demonstrated. I have to admit that I can't fully follow your suggestion as I am not familiar with copulas. However, using copulas for estimation was also suggested by https://www.sciencedirect.com/science/article/pii/S1007021411700086#cesec40. Other suggestions include using kernel density estimation and binning. The question is which of those should be preferred. – Julian Karch Feb 13 '19 at 14:14
  • What advantages does this method have over simply trying to estimate the joint density by kernel density estimation? – adityar Feb 15 '19 at 12:10
  • It does not necessarily have any advantage over kernel density estimation. However, kernel density estimation tends to require a lot of data, especially if the data comes from a random variable without bounded support. Therefore, if the model assumptions you make to transform this problem into a simpler, one (or maybe two)-parameter estimation problem are not too restrictive, you could get a better estimate. – Simon Boge Brant Feb 18 '19 at 12:30
  • is it correct to say that **mutual information** measures the **entropy of a copula**? – develarist Jul 30 '20 at 01:35
  • @develarist I found a paper [(Ma, Jian, and Zengqi Sun. "Mutual information is copula entropy." Tsinghua Science & Technology 16.1 (2011): 51-54.)](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6077935) that explores this relationship, and I posted two questions about it [1](https://stats.stackexchange.com/questions/510992/copula-entropy-calculation-is-borked) [2](https://stats.stackexchange.com/questions/511088/mutual-information-relationship-to-copula-entropy-is-borked). – Dave Jul 16 '21 at 16:28