16

I have two non-parametric rank correlations matrices emp and sim (for example, based on Spearman's $\rho$ rank correlation coefficient):

library(fungible)
emp <- matrix(c(
1.0000000, 0.7771328, 0.6800540, 0.2741636,
0.7771328, 1.0000000, 0.5818167, 0.2933432,
0.6800540, 0.5818167, 1.0000000, 0.3432396,
0.2741636, 0.2933432, 0.3432396, 1.0000000), 4, 4)

# generate a sample correlation from population 'emp' with n = 25
sim <- corSample(emp, n = 25)
sim$cor.sample

          [,1]      [,2]      [,3]      [,4]
[1,] 1.0000000 0.7221496 0.7066588 0.5093882
[2,] 0.7221496 1.0000000 0.6540674 0.5010190
[3,] 0.7066588 0.6540674 1.0000000 0.5797248
[4,] 0.5093882 0.5010190 0.5797248 1.0000000

The emp matrix is the correlation matrix that contains correlations between the emprical values (time series), the sim matrix is the correlation matrix -- the simulated values.

I have read the Q&A How to compare two or more correlation matrices?, in my case it is known that emprical values are not from normal distribution, and I can't use the Box's M test.

I need to test the null hypothesis $H_0$: matrices emp and sim are drawn from the same distribution.

Question. What is a test do I can use? Is is possible to use the Wishart statistic?

Edit. Follow to Stephan Kolassa's comment I have done a simulation.

I have tried to compare two Spearman correlations matrices emp and sim with the Box's M test. The test has returned

# Chi-squared statistic = 2.6163, p-value = 0.9891

Then I have simulated 1000 times the correlations matrix sim and plot the distribution of Chi-squared statistic $M(1-c)\sim\chi^2(df)$.

enter image description here

After that I have defined the 5-% quantile of Chi-squared statistic $M(1-c)\sim\chi^2(df)$. The defined 5-% quantile equals to

quantile(dfr$stat, probs = 0.05)
#       5% 
# 1.505046

One can see that the 5-% quantile is less that the obtained Chi-squared statistic: 1.505046 < 2.6163 (blue line on the fugure), therefore, my emp's statistic $M(1−c)$ does not fall in the left tail of the $(M(1−c))_i$.

Edit 2. Follow to the second Stephan Kolassa's comment I have calculated 95-% quantile of Chi-squared statistic $M(1-c)\sim\chi^2(df)$ (blue line on the fugure). The defined 95-% quantile equals to

quantile(dfr$stat, probs = 0.95)
#      95% 
# 7.362071

One can see that the emp's statistic $M(1−c)$ does not fall in the right tail of the $(M(1−c))_i$.

Edit 3. I have calculated the exact $p$-value (green line on the figure) through the empirical cumulative distribution function:

ecdf(dfr$stat)(2.6163)
[1] 0.239

One can see that $p$-value=0.239 is greater than $0.05$.

References

Reza Modarres & Robert W. Jernigan (1993) A robust test for comparing correlation matrices, Journal of Statistical Computation and Simulation, 46:3-4, 169-181. The first founded paper that has no the assumption about normal distribution. There are two different tests were proposed. The quadratic form test is more simple one.

Dominik Wied (2014): A Nonparametric Test for a Constant Correlation Matrix, Econometric Reviews, DOI: 10.1080/07474938.2014.998152 Authors proposed a nonparametric procedure to test for changes in correlation matrices at an unknown point in time.

Joël Bun, Jean-Philippe Bouchaud and Mark Potters (2016), Cleaning correlation matrices, Risk.net, April 2016

Li, David X., On Default Correlation: A Copula Function Approach (September 1999). Available at SSRN: https://ssrn.com/abstract=187289 or http://dx.doi.org/10.2139/ssrn.187289

G. E. P. Box, A General Distribution Theory for a Class of Likelihood Criteria. Biometrika. Vol. 36, No. 3/4 (Dec., 1949), pp. 317-346

M. S. Bartlett, Properties of Sufficiency and Statistical Tests. Proc. R. Soc. Lond. A 1937 160, 268-282

Robert I. Jennrich (1970): An Asymptotic χ2 Test for the Equality of Two Correlation Matrices, Journal of the American Statistical Association, 65:330, 904-912.

Kinley Larntz and Michael D. Perlman (1985) A Simple Test for the Equality of Correlation Matrices. Technical report No 63.

Arjun K. Gupta, Bruce E. Johnson, Daya K. Nagar (2013) Testing Equality of Several Correlation Matrices. Revista Colombiana de Estadística Diciembre 36(2), 237-258

Elisa Sheng, Daniela Witten, Xiao-Hua Zhou (2016) Hypothesis testing for differentially correlated features. Biostatistics, 17(4), 677–691

James H. Steiger (2003) Comparing Correlations: Pattern Hypothesis Tests Between and/or Within Independent Samples


It is not the answer.

I have simulated n=1000 times the correlations matrix sim, calculate the statistic $M(1-c)_i$, $i=1,2,...,n$ and ploted the Chi-squared statistic distribution (left) and Cumulative Distribution Function (right).

enter image description here

The null hypothesis $H_0$: matrices emp and sim are drawn from the same distribution.

The alternative hypothesis $H_1$: matrices emp and sim are not drawn from the same distribution.

We have a two-tailed test at $α=5\%$. The critical values are:

alpha <- 0.05
q025  <- quantile(x, probs =     alpha/2);q025
#    2.5% 
# 1.222084 
q975  <- quantile(x, probs = 1 - alpha/2);q975
#   97.5% 
# 8.170121 

From the calculation one can see: 1.222084 < M(1-c)= 2.6163 < 8.170121, therefore, $H_0$ is true.

Counter-example. I have simulated a sample xx from $\chi^2(df)$ distribution and find the sample characteristics:

m  <- 2                # number of matrices
k  <- 4                # size of matrices
df <- k*(k+1)*(m-1)/2  # degree of freedom    
xx <- rchisq(1000, df=df)

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

Mode(xx) 
# [1] 5.845786
mean(xx)
# [1] 10.1366808
quantile(xx, probs =     alpha/2)
#    2.5% 
# 3.057377 
quantile(xx, probs = 1 - alpha/2)
#   97.5% 
# 19.91842    

The sample's mean 10.1366808 falls into the left tail of the statistic M(c-1) distribution, therefore, $H_0$ is not true.

But the sample's mode 5.845786 fails into the middle range.

Glorfindel
  • 700
  • 1
  • 9
  • 18
Nick
  • 792
  • 5
  • 25
  • 1
    My first idea would be to simulate your null distributions many times and in each case calculate the test statistic $(M(1-c))_i$ of Box's M test. Then you have this statistic's null distribution for the specific null distribution of your underlying data, no normality necessary, and you can compare whether your `emp`'s $M(1-c)$ falls in the tail of the $(M(1-c))_i$. Would this work? – Stephan Kolassa May 17 '17 at 09:09
  • 1
    Perhaps the most important idea behind Box's M test is the statistic it defines to compare the matrices. That statistic *ought* to work well even for Spearman matrices (they are, after all, correlation matrices among data--the ranks). That permits you to apply standard methods such as a permutation test or bootstrapping, as suggested by @Stephan. Of course this intuition needs to be verified and the power of the resulting test needs to be evaluated (which is why I'm writing this as a comment and not an answer). – whuber Sep 12 '17 at 14:33
  • @StephanKolassa, thanks for the idea, its sounds like the permutation test. I have updated my question. – Nick Sep 17 '17 at 01:18
  • Hm. Wouldn't you be interested in the *right* tail of the distribution? I'm not familiar with this particular test, but most tests that use a $\chi^2$ statistic will have a *high* value if the null hypothesis is not true in an "interesting" way. So you'd compare the 95% quantile of your simulated statistics. What do you get then? – Stephan Kolassa Oct 11 '17 at 06:43
  • @StephanKolassa, thanks you for the attention on my question. I have updated the text. – Nick Oct 11 '17 at 15:19
  • Your statistic of 2.62 is far below the simulation 95% quantile of 7.36. It seems you can't reject the null hypothesis at the 5% alpha level. (You can actually calculate an exact p value through `ecdf(dfr$stat)(2.6163)`.) – Stephan Kolassa Oct 14 '17 at 21:18
  • @StephanKolassa, thanks, I have calculated ecdf(dfr$stat)(2.6163)=0.239>0.05. Unfortunately, I don't know how to make a decision correctly from this result. – Nick Oct 15 '17 at 08:38
  • 2
    Assuming the hull hypothesis that Box tests is that the matrices are drawn from the same distribution then you have failed to reject it – mdewey Oct 15 '17 at 12:34
  • I concur with @mdewey. – Stephan Kolassa Oct 16 '17 at 06:30
  • 1
    From some quick reading, it seems Box's M test is a modified likelihood ratio test where the joint distribution of the matrices is assumed to be the product of two Wisharts, which is an assumption based on the normality of the data. So if you could find an approximate or limiting distribution of the Spearman estimator, you could probably derive a new LRT statistic in this way. It might also help to look the original papers Barley (1937) and Box (1949) to get a sense of their modifications (they're on JSTOR). – deasmhumnha Apr 09 '18 at 10:09
  • @DezmondGoff, did you mean "M. S. Bartlett, Properties of Sufficiency and Statistical Tests. Proc. R. Soc. Lond. A 1937 160, 268-282"? – Nick Apr 10 '18 at 02:14
  • @Nick Ha, yeah. Autocorrect strikes again. – deasmhumnha Apr 10 '18 at 07:17
  • 2
    I wish I had more time to investigate this but an avenue that has not been mentioned is the work done of the analysis of covariance operators as objects and in particular tests about their equality or distance (eg. [Pigoli et al. 2014](https://academic.oup.com/biomet/article/101/2/409/1778250), [Fremdt et al. 2013](https://arxiv.org/abs/1104.4049), etc.) This naturally extends to correlation operators (eg. [Weid 2017](https://www.tandfonline.com/doi/abs/10.1080/07474938.2014.998152)) – usεr11852 Apr 15 '18 at 10:55
  • @usεr11852, do you familiar with R packages for correlation operators? – Nick Apr 16 '18 at 04:13
  • No, I am not familiar at this point. – usεr11852 Apr 16 '18 at 17:38
  • @StephanKolassa, I read your comments again. The alternative hypothesis $H_1$: matrices emp and sim are drawn from the different distribution. In this case we have the two-tailed test. Where should the test statistic be? between the quantile value q(0.025) and q(0.975) or not? – Nick May 24 '18 at 08:18
  • If you do want a two-tailed test at $\alpha=5\%$, then yes. – Stephan Kolassa May 24 '18 at 08:38

1 Answers1

1

Since we are working with matrices constructed from the same set of ranks to construct corresponding Spearman correlations matrices, this 2012 simple method presented in this work: A simple procedure for the comparison of covariance matrices, may be of value.

In particular to quote:

Here I propose a new, simple method to make this comparison in two population samples that is based on comparing the variance explained in each sample by the eigenvectors of its own covariance matrix with that explained by the covariance matrix eigenvectors of the other sample. The rationale of this procedure is that the matrix eigenvectors of two similar samples would explain similar amounts of variance in the two samples. I use computer simulation and morphological covariance matrices from the two morphs in a marine snail hybrid zone to show how the proposed procedure can be used to measure the contribution of the matrices orientation and shape to the overall differentiation.

Of particular import is the claimed results and conclusions:

Results I show how this procedure can detect even modest differences between matrices calculated with moderately sized samples, and how it can be used as the basis for more detailed analyses of the nature of these differences.

Conclusions The new procedure constitutes a useful resource for the comparison of covariance matrices. It could fill the gap between procedures resulting in a single, overall measure of differentiation, and analytical methods based on multiple model comparison not providing such a measure.

And further comments from the available full text:

In the present work I propose a new, simple and distribution-free procedure for the exploration of differences between covariance matrices that, in addition to providing a single and continuously varying measure of matrix differentiation, makes it possible to analyse this measure in terms of the contributions of differences in matrix orientation and shape. I use both computer simulation and P matrices corresponding to snail morphological measures to compare this procedure with some widely used alternatives. I show that the new procedure has power similar or better than that of the simpler methods, and how it can be used as the basis for more detailed analyses of the nature of the found differences.

If other methods prove less impressive, you may which to further investigate the above for the comparison of rank correlation matrices performing your own simulation testing.

AJKOER
  • 1,800
  • 1
  • 9
  • 9