1

I'm looking for the ideal independence test of two varaibles of unknown distribution, i.e. a non-parameteric test. Would choose between alternatives based on statistical power.

Few options that came to my mind are Kendall Tau and a test based on Spearman's rho. Chi-squared is an option, although power is a function of an additional parameter (binning), which makes it less compelling.

I wonder what other options are out there and how different options compare in terms of statistical power.

To give more color, I work in trading/finance, where forecasting relationships tend to be just as weak that it is difficult to find them. Once they are stronger, they quickly weaken to adhere to efficient market and no free lunch.

Relationships between variable do not have to be monotonic, but continuity of conditional mean or finite number of non-nontinuities is reasonable to assume.

We can assume iid-ness and typical sample size is 250 - 2500.

Mark Horvath
  • 795
  • 1
  • 8
  • 9
  • What's "strong independence"? (or if you meant a strong test of independence, what's a "strong test"?). What do you mean by "strength"? Are you talking about power? Against what alternatives? – Glen_b Oct 03 '15 at 17:35
  • @Glen_b, yes, I mean power, thanks for noting. I hope its easier to interpret now, I'm not sure about your last question though. – Mark Horvath Oct 03 '15 at 17:45
  • There's an infinite number of ways for things to be dependent. Measures like Spearman and Kendall correlations measure particular (and quite similar) types of dependence (and so have good power in tests when the form of dependence is like the kind of thing they measure) -- but there's all manner of dependence that neither of them pick up. So the question is ... what kinds of dependence do you want power to identify? The kind(s) that those two things can pick up? Or something else, perhaps something more general or something more specific? – Glen_b Oct 03 '15 at 18:19
  • I completely agree, we can't have a test which finds all kind of dependencies, I'm looking for ideas though which work often well in practice. Like it happens very rearly that a test of pearson correlation would result in a stronger p-value than Spearman, hence I prefer Spearman (unless I really need to test only linear correlation, which is not the case now). Because the distributions are unknown, Spearman and Kendall work fine as they only look at rankings. I wonder if there is anything more interesting here... – Mark Horvath Oct 03 '15 at 19:25
  • You could have a sequence of tests which as n grows would will have power against any form of dependence, but the problem is it will tend to have relatively lower power than one suited to the kind of dependence you want power against. When you say "I wonder if there is anything more interesting here", you still have not identified what kinds of performance would be interesting to you. *No* test will "work well in practice" against dependence it doesn't have power against, and so again, *what forms of dependence do you want power against*? ... ctd – Glen_b Oct 04 '15 at 00:52
  • (ctd)... For example are you *only* interested in monotonic association? Or would more general functional association be important -- e.g. if the relationship was first increasing and then decreasing, would that be important to find, or not? Or ...what if there was a mix of two subpopulations of roughly equal size, one with a linear increasing association, one with a linear decreasing association (producing an "X" shaped relationship between the two variables), would that be important to find? ... and so on. ... We can't tell you what things you want power against. – Glen_b Oct 04 '15 at 00:56
  • Out of curiosity, do you have a typical sort of sample size? My advice at n=30 would often tend to be quite different from advice at n=30000. – Glen_b Oct 04 '15 at 02:56
  • Hi @Glen_b, useful classifications indeed... To give more color, I work in trading/finance, where forecasting patterns tend to be just as weak that it is difficult to find them. Once they are stronger, they quickly weaken to adhere to efficient market and no free lunch. Unfortunately relationships do not have to be monotonic, but continuity of conditional mean or finite number of non-nontinuities is reasonable to assume. We can assume iid-ness and typical sample size is 250 - 2500. – Mark Horvath Oct 04 '15 at 22:28
  • Mark -- that's important information that would be good to include in your question. – Glen_b Oct 05 '15 at 03:12

2 Answers2

2

You may be interested in Hoeffding's independence test, which can be calculated using the R function hoeffd in the Hmisc package, and uses a test statistic resembling that of the Crámer-von Mises goodness of fit. The test is consistent, provided we restrict the alternative hypothesis to the case where the two variables are dependent with a continuous joint distribution function; in other words, it is "powerful" against all such alternatives, given a large enough sample.

Another possibility is to construct a contingency table by splitting the range into bins and then apply the $\chi^2$ test for independence.

Brent Kerby
  • 2,303
  • 11
  • 11
  • Hoeffding's test just fits the purpose. More ideas here: http://stats.stackexchange.com/questions/73646/how-do-i-test-that-two-continuous-variables-are-independent – Mark Horvath Oct 10 '15 at 13:49
1

The test described in Heller, Heller and Gorfine (2012) detects any form of dependence and is powerful in many dependence situations. The R package to use it is 'HHG' and function 'hhg.test'. Documentation here: https://cran.r-project.org/web/packages/HHG/HHG.pdf

Another popular method is the distance covariance by Székely and Rizzo (2009), with the R package 'energy' and function 'dcov.test'. Documentation found here: https://cran.r-project.org/web/packages/energy/energy.pdf

  • Thanks! HHG: https://arxiv.org/abs/1201.3522 and Szekely, RIzzo: https://projecteuclid.org/download/pdfview_1/euclid.aoas/1267453933 in more detail... – Mark Horvath Jun 04 '16 at 10:41