4

It looks like R's cor.test returns p-values of exactly zero if the real p-value is very low. For example:

> sprintf('%e', cor.test(1:100, c(1.2, 1:98, 1.1), method='spearman')$p.value)
[1] "0.000000e+00"

In SciPy this same test results in a very low, but nonzero, p-value:

> print scipy.stats.spearmanr(range(100), [1.2]+range(98)+[1.1])
(0.94289828982898294, 1.3806191275561446e-48)

Presumably the p-value gets rounded down to 0 if the value becomes so small that R cannot represent it anymore using its normal floating point type? Is there a simple way to obtain the exact number or is the best I can do to report p < 2.2e-16?

Gavin Simpson
  • 37,567
  • 5
  • 110
  • 153
Nils
  • 273
  • 1
  • 9
  • 1
    See the help page for `cor.test` where it states "pKendall and pSpearman in package SuppDists, spearman.test in package pspearman, which supply different (and often more accurate) approximations." Loading `SuppDists`, extracting `$estimate` from your `cor.test` result, and passing it to `pSpearman` gives an allegedly exact value. – whuber Jun 29 '15 at 22:20
  • 2
    Tests in vanilla R simply don't report p-values [lower than `.Machine$double.eps`](http://stats.stackexchange.com/questions/78839/how-should-tiny-p-values-be-reported-and-why-does-r-put-a-minimum-on-2-22e-1/78840#78840). See that link for an explanation of why it makes little sense to discuss any notion of "exact" p values anywhere near that small anyway. It's a bit like arguing about how many angels can dance on the head of a pin, when you can only look at a different type of pin to the kind you want to discuss. The extreme tails depend heavily on assumptions like between-point independence. – Glen_b Jun 30 '15 at 01:09
  • Ben Bolker's comment under another answer at that link is especially apt: "such small p-values [...] are so tiny that the probability that the NSA broke in and tampered with your data [...] is far, far, higher than the nominal p-value. ". (I sometimes give examples that relate to cosmic rays flipping a few important bits in your data, but I think his example is probably more apt.) – Glen_b Jun 30 '15 at 01:21
  • @Glen_b It is nevertheless worthwhile paying attention to such things. Tiny differences among unusual values can sometimes be the only evident indications that something is wrong with an algorithm: they are the proverbial canaries. In this case it is also intriguing that SciPy's Spearman $\rho$ statistic (as well as its p-value) differs from that in `R`, even though they agree to many d.p. Although these differences may be inconsequential, upon observing them we must immediately mistrust the output of *both* programs for *all* inputs until we understand the reason for the discrepancy. – whuber Jun 30 '15 at 13:23
  • @whuber certainly the difference in $\rho$ values is important; with that, we'd expect a difference in p-values. I simply didn't address that aspect in my comments. – Glen_b Jun 30 '15 at 16:10

0 Answers0