12

Background: I read one article where authors report Pearson correlation 0.754 from sample size 878. Resulting p-value for correlation test is "two star" significant (i.e. p < 0.01). However, I think that with such a large sample size, corresponding p-value should be less than 0.001 (i.e. three star significant).

  • Can p-values for this test be computed just from Pearson correlation coefficient and sample size?
  • If yes, how can this be done in R?
Jeromy Anglim
  • 42,044
  • 23
  • 146
  • 250
sitems
  • 3,649
  • 1
  • 25
  • 52
  • 1
    For those interested, here is an [online p-value calculator that takes r and n](http://www.danielsoper.com/statcalc3/calc.aspx?id=44). – Jeromy Anglim Jun 06 '13 at 08:33

2 Answers2

14

Yes, it can be done, if you use Fisher's R-to-z transformation. Other methods (e.g. bootstrap) can have some advantages but require the original data. In R (r is the sample correlation coefficient, n is the number of observations):

z <- 0.5 * log((1+r)/(1-r))
zse <- 1/sqrt(n-3)
min(pnorm(z, sd=zse), pnorm(z, lower.tail=F, sd=zse))*2

See also this post on my blog.

That said, whether it is .01 or .001 doesn't matter that much. As you said, this is mostly a function of sample size and you already know that the sample size is large. The logical conclusion is that you probably don't even need a test at all (especially not a test of the so-called ‘nil’ hypothesis that the correlation is 0). With N = 878, you can be quite confident in the precision of the estimate and focus on interpreting it directly (i.e. is .75 large in your field?).

Formally however, when you do a statistical test in the Neyman-Pearson framework, you need to specify the error level in advance. So, if the results of the test really matter and the study was planned with .01 as the threshold, it only makes sense to report p < .01 and you should not opportunistically make it p < .001 based on the obtained p value. This type of undisclosed flexibility is even one of the main reasons behind criticism of little stars and more generally of the way null-hypothesis significance testing is practiced in social science.

See also Meehl, P.E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46 (4), 806-834. (The title contains a reference to these “stars” but the content is a much broader discussion of the role of significance testing.)

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
Gala
  • 8,323
  • 2
  • 28
  • 42
  • Ok, but suppose that you review article with such a results. You have to compute p-value if you want to warn authors that their results are not correct. – sitems Jun 06 '13 at 06:37
  • 1
    I would probably advise them to give up the little stars, even if the results are correct but I see your point. – Gala Jun 06 '13 at 06:50
  • 1
    I edited my answer to add a remark about this problem. Note that 0.001 < 0.01 so the authors are formally “correct” in any case, it's more a matter of what the way the results are reported implies. I would think that, unlike an outright error that a reviewer should of course correct, this issue should be left to the authors to decide. – Gala Jun 06 '13 at 07:02
  • 1
    You are right, but so far I have never seen reporting p < 0.01 if p is actually less than 0.001 (without saying that confidence level for article is 0.01). Moreover, in article that I speak about, authors report 30 correlation tests based on sample sizes ranging from 837 to 886 with correlations ranging from 0.145 to 0.754 and all are reported as two star significant. – sitems Jun 06 '13 at 07:12
  • Well, obviously I don't know about the specifics or the customs in your field but I can tell you that, as an author, I would be pissed off if a reviewer would demand opportunistic stars in a table, especially considering that anyone who cared could redo the computation themselves using the correlation coefficient and whatever error level they like. The most important thing in my view is to ensure that the correlations and sample sizes are clearly reported. – Gala Jun 06 '13 at 07:19
  • 1
    I have a problem to post my code here, but I run simulations and p-value from your code is not the same as p-value from cor.test. – sitems Jun 06 '13 at 08:14
  • I discussed that in my blogpost, it's not exactly the same test. The difference is minimal, especially for large sample sizes. The confidence interval is exactly the same, though. – Gala Jun 06 '13 at 08:16
  • I tried very small sample size (n=5). After increasing it, results are really very similar, thank you a lot Gael. – sitems Jun 06 '13 at 08:19
  • 4
    I wrote a tutorial review of the use of Fisher's z for correlations accessible at http://www.stata-journal.com/sjpdf.html?articlenum=pr0041 I'd recommend more use of confidence intervals and calculate 0.724, 0.781 as 95% limits. I'd recommend even more looking at the data and working out a regression. – Nick Cox Jun 06 '13 at 12:47
2

you use Fisher's R-to-z transformation.

There's an alternative statistic:

abs(r)*sqrt((n-2)/(1-r^2)) ~ t.dist(d.f.=n-2)

that has t-distribution with n-2 degrees of freedom. Which is how this works for example: http://www.danielsoper.com/statcalc3/calc.aspx?id=44

Germaniawerks
  • 1,027
  • 1
  • 10
  • 15