21

I've just been introduced (vaguely) to brownian/distance covariance/correlation. It seems particularly useful in many non-linear situations, when testing for dependence. But it doesn't seem to be used very often, even though covariance/correlation are often used for non-linear/chaotic data.

That has me thinking that there might be some drawbacks to distance covariance. So what are they, and why doesn't everyone just always use distance covariance?

naught101
  • 4,973
  • 1
  • 51
  • 85
  • 6
    For reference, I created a [distance correlation version](https://en.wikipedia.org/wiki/Distance_correlation) of the [correlation graph on wikipedia](https://en.wikipedia.org/wiki/Correlation) – naught101 Mar 02 '12 at 03:44
  • I read that you were using dcov to compare non linear time series and combine them with weight..I was wondering if what you did is using a weighted distance covariance..meaning that you gave different weights to your data using a weight vector to calculate the distance correlation? I'm trying to do that but I'm not sure if introducing a weight vector into the distance correlation formulas is the right way to go. – user3757561 Jun 20 '14 at 17:05
  • No, sorry @user3757561, I was just trying distance correlation as a replacement for correlation, and then creating weights based on that. But I didn't end up using it anyway... – naught101 Jun 23 '14 at 13:25

2 Answers2

19

I have tried to collect a few remarks on distance covariance based on my impressions from reading the references listed below. However, I do not consider myself an expert on this topic. Comments, corrections, suggestions, etc. are welcome.

The remarks are (strongly) biased towards potential drawbacks, as requested in the original question.

As I see it, the potential drawbacks are as follows:

  1. The methodology is new. My guess is that this is the single biggest factor regarding lack of popularity at this time. The papers outlining distance covariance start in the mid 2000s and progress up to present day. The paper cited above is the one that received the most attention (hype?) and it is less than three years old. In contrast, the theory and results on correlation and correlation-like measures have over a century of work already behind them.
  2. The basic concepts are more challenging. Pearson's product-moment correlation, at an operational level, can be explained to college freshman without a calculus background pretty readily. A simple "algorithmic" viewpoint can be laid out and the geometric intuition is easy to describe. In contrast, in the case of distance covariance, even the notion of sums of products of pairwise Euclidean distances is quite a bit more difficult and the notion of covariance with respect to a stochastic process goes far beyond what could reasonably be explained to such an audience.
  3. It is computationally more demanding. The basic algorithm for computing the test statistic is $O(n^2)$ in the sample size as opposed to $O(n)$ for standard correlation metrics. For small sample sizes this is not a big deal, but for larger ones it becomes more important.
  4. The test statistic is not distribution free, even asymptotically. One might hope that for a test statistic that is consistent against all alternatives, that the distribution—at least asymptotically—might be independent of the underlying distributions of $X$ and $Y$ under the null hypothesis. This is not the case for distance covariance as the distribution under the null depends on the underlying distribution of $X$ and $Y$ even as the sample size tends to infinity. It is true that the distributions are uniformly bounded by a $\chi^2_1$ distribution, which allows for the calculation of a conservative critical value.
  5. The distance correlation is a one-to-one transform of $|\rho|$ in the bivariate normal case. This is not really a drawback, and might even be viewed as a strength. But, if one accepts a bivariate normal approximation to the data, which can be quite common in practice, then little, if anything, is gained from using distance correlation in place of standard procedures.
  6. Unknown power properties. Being consistent against all alternatives essentially guarantees that distance covariance must have very low power against some alternatives. In many cases, one is willing to give up generality in order to gain additional power against particular alternatives of interest. The original papers show some examples in which they claim high power relative to standard correlation metrics, but I believe that, going back to (1.) above, its behavior against alternatives is not yet well understood.

To reiterate, this answer probably comes across quite negative. But, that is not the intent. There are some very beautiful and interesting ideas related to distance covariance and the relative novelty of it also opens up research avenues for understanding it more fully.

References:

  1. G. J. Szekely and M. L. Rizzo (2009), Brownian distance covariance, Ann. Appl. Statist., vol. 3, no. 4, 1236–1265.
  2. G. J. Szekely, M. L. Rizzo and N. K. Bakirov (2007), Measuring and testing independence by correlation of distances, Ann. Statist., vol. 35, 2769–2794.
  3. R. Lyons (2012), Distance covariance in metric spaces, Ann. Probab. (to appear).
cardinal
  • 24,973
  • 8
  • 94
  • 128
  • Excellent answer, thank you. Some of it is a bit over my head, but I think I'll be able to remedy that myself :) – naught101 Mar 28 '12 at 00:35
  • 1
    See also Summary and discussion of: “Brownian Distance Covariance” Statistics Journal Club, 36-825 Benjamin Cowley and Giuseppe Vinci October 27, 2014 http://www.stat.cmu.edu/~ryantibs/journalclub/dcov.pdf – Felipe G. Nievinski Jan 05 '17 at 04:49
  • 2
    When both random variables are univariate the distance correlation can be computed in $\mathcal{O}(n \log n)$ time, see https://www.tandfonline.com/doi/abs/10.1080/00401706.2015.1054435 for example. – Arin Chaudhuri Feb 25 '19 at 18:10
3

I could well be missing something, but just having a quantification of the nonlinear dependence between two variables doesn't seem to have much of a payoff. It won't tell you the shape of the relationship. It won't give you any means to predict one variable from the other. By analogy, when doing exploratory data analysis one sometimes uses a loess curve (locally weighted scatterplot smoother) as a first step towards seeing whether the data are best modeled with a straight line, a quadratic, a cubic, etc. But the loess in and of itself is not a very useful predictive tool. It's just a first approximation on the way to finding a workable equation to describe a bivariate shape. That equation, unlike the loess (or the distance covariance result), can form the basis of a confirmatory model.

rolando2
  • 11,645
  • 1
  • 39
  • 60
  • For my purposes, it does have a payoff. I'm not using dcov() for predicting anything, rather, comparing multiple non-linear time-series in an ensemble, and combining them with weights based on their dependence. In this situation, dcov() has potentially large benefits. – naught101 Mar 25 '12 at 00:08
  • @naught101 Can you put in some m ore info.- when you say -'combine'? This sounds interesting to me in terms of weighting based on nonlinear dependence. Do you mean- categorizing the time series into groups? Also-what do high and low weights emphasize in this scenario? – hearse Mar 28 '12 at 15:25
  • 2
    @PraneethVepakomma: check out my answer at http://stats.stackexchange.com/questions/562/when-to-use-multiple-models-for-prediction/25127#25127 – naught101 Mar 28 '12 at 23:06
  • 1
    Also, if you know the general form of dependence (e.g., polynomial equation), then you may quantify the strength of the dependence using the coefficient of determination, see, e.g., [Computing Adjusted R2 for Polynomial Regressions](https://www.mathworks.com/help/matlab/data_analysis/linear-regression.html#bswinlz) – Felipe G. Nievinski Jan 05 '17 at 05:22