How to evaluate level of significance of two similar correlations?

Question

I have four datasets: A1, A2, B1, B2. Every dataset has between 100-300 items.

Every item in every dataset has two values: x, y

The goal:

Find what datasets have similar x values.
If the datasets have similar x values, are their correlations between x and y similar? And vice-versa.

With t-test for x values I found out, that A1 and A2 are not too different (mean value is not significantly different). The same thing stands for B1, B2. But every of A datasets is significantly different than any of B datasets. In list

A1.x and A2.x - similar
B1.x and B2.x - similar
A1.x and (B1.x or B2.x) - different
A2.x and (B1.x or B2.x) - different

Now I am interested, if the correlation between x and y in dataset, is the same for A1 and A2, while it is different for correlation of B1 and B2 (what should be the same again). I calculated this correlations and I got:

correlation of A1.x and A1.y = 0.487
correlation of A2.x and A2.y = 0.460
correlation of B1.x and B1.y = 0.598
correlation of B2.x and B2.y = 0.610

Main question: What test I should use, to measure how significant is this similarity / difference in the correlations? Because it probably still could be just coincidence.

Other question: Is the t-test good way how to estimate if two datasets comes from the same precess? Should I do it also for y values in this case?

I hope it is clear what I need. If not, please comment what is unclear, I will do my best to explain.

score 1 · Accepted Answer · edited Apr 13 '17 at 12:44

1

Diedenhofen & Musch (2015, PLoS ONE) discuss various tests for significant differences between measured correlations, with pointers to literature. They also discuss confidence intervals. Unfortunately, the companion cocor package for R was removed from CRAN - apparently it failed automated checks during an R upgrade, and the authors did not address these issues in a timely manner.

Regarding your other question, it depends on what you are interested in. If you are only interested in whether the $x$ distributions have the same mean, a t test is appropriate. (Assuming equal or different variances, as the case may be.) You could also test whether variances are equal, e.g., using an F test. Alternatively, you could use a two-sample Kolmogorov-Smirnov test to assess whether the two samples come from the same underlying distribution.

edited Apr 13 '17 at 12:44

Community

1

answered May 25 '16 at 11:26

Stephan Kolassa

95,027
13
197
357

Ok, and what about the meaning of the correlations? Is it enough "to prove" that datasets are probably from the same "origin", or do I need some other test ("to prove" something about correlations)? – matousc May 25 '16 at 12:12
You are asking a question that goes to the heart of null hypothesis significance testing. NHST can never *prove* anything. It will always only check whether your data are consistent with a default "null hypothesis" - in your case, that the two $x$ vectors come from the same population, respectively that the population correlations are equal. Yes, this is a problem. You may want to browse through [questions tagged "significance-testing"](http://stats.stackexchange.com/questions/tagged/statistical-significance?sort=votes&pageSize=50), or consider Bayesian approaches. – Stephan Kolassa May 25 '16 at 16:02

How to evaluate level of significance of two similar correlations?

1 Answers1