Distance correlation versus mutual information

Question

I've worked with the mutual information for some time. But I found a very recent measure in the "correlation world" that can also be used to measure distribution independence, the so called "distance correlation" ( also termed Brownian correlation): http://en.wikipedia.org/wiki/Brownian_covariance. I checked the papers where this measure is introduced, but without finding any allusion to the mutual information.

So, my questions are:

Do they solve exactly the same problem? If not, how the problems are different?
And if the previous question can be answered on the positive, what are the advantages of using one or the other?

Try to write down explicitly 'distance correlation' and 'mutual information' for a simple example. In the second case you will get logarithms, while in the first - not. — Piotr Migdal, Jan 10 '12 at 11:52
@PiotrMigdal Yes, I'm aware of that difference. Could you please explain why is it important? Please, take into account that I'm not a statistician... — dsign, Jan 10 '12 at 11:56
For ma a standard tool measuring mutual dependence of probability distributions is the mutual information. It has a lot of nice properties and its interpretation is straightforward. However, there may be specific problems where distance correlation is preferred (but I have never used it in my life). So what is the problem you are trying to solve? — Piotr Migdal, Jan 10 '12 at 14:53
This comment is a few years late but Columbia University's Statistics Dept made the academic year 2013-2014 a year of focus on measures of dependence. In April-May 2014, a workshop was held that brought together the top academics doing work in this field including the Reshef Brothers (MIC), Gabor Szekely (distance correlations), Subhadeep Mukhopadhay to name a few. Here's a link to the program that includes many pdfs from the presentations. http://dependence2013.wikischolars.columbia.edu/Nonparametric+measures+of+dependence+workshop — Mike Hunter, Oct 15 '15 at 11:24

score 9 · Answer 1 · answered Feb 24 '12 at 13:55

9

Information / mutual information does not depend on the possible values, it depends only on the probabilities therefore it is less sensitive. Distance correlation is more powerful and simpler to compute. For a comparision see

http://www-stat.stanford.edu/~tibs/reshef/comment.pdf

answered Feb 24 '12 at 13:55

gabor J Szekely

91
2

2

Hi, thanks for your answer! The paper you refer to is about MIC, which is I'm believe is a bit more than MI. I have implemented the distance correlation measure and I don't think it be simpler than the MI for the elemental case of discrete categorical variables. Then again one thing that I have learned is that DCM is well defined and well behaved for continuous variables, but with MI you need to do binning or fancy stuff ala MIC. – dsign Feb 24 '12 at 14:05
3

However, DCM seems to need square matrices whose side is the number of samples. In other words, space complexity scales quadratically. Or at least that's my impression, I would like to be in a mistake. MIC does better, because you can tune it in some sort of compromise between precision and performance. – dsign Feb 24 '12 at 14:14

Distance correlation versus mutual information

1 Answers1