8

Going through the Wiki article on the Phi coefficient, I've noticed that for paired binary data "a Pearson correlation coefficient estimated for two binary variables will return the phi coefficient".

Upon running a quick simulation I found this to not be the case. However, it appears that the phi coefficient does approximate the pearson's correlation coefficient.

x <- c(1,   1,  0,  0,  1,  0,  1,  1,  1)
y <- c(1,   1,  0,  0,  0,  0,  1,  1,  1)
cor(x,y)
sqrt(chisq.test(table(x,y))$statistic/length(x)) # phi

x <- rep(x, 1000)
y <- rep(y, 1000)
sqrt(chisq.test(table(x,y))$statistic/length(x)) # phi
# it now DOES approximates the pearsons correlation.
cor(x,y)

But it is not apparent to me why (mathematically) this is the case.

Andy
  • 347
  • 2
  • 6
Tal Galili
  • 19,935
  • 32
  • 133
  • 195

1 Answers1

7

By default, chisq.test() applies a continuity correction when computing the test statistic for 2x2 tables. If you switch off this behavior, then:

x = c(1,  1,  0,  0,  1,  0,  1,  1,  1)
y = c(1,  1,  0,  0,  0,  0,  1,  1,  1)
cor(x,y)
sqrt(chisq.test(table(x,y), correct=FALSE)$statistic/length(x)) # phi

will give you exactly the same answer. And this essentially also answers why $\sqrt{\chi^2/n}$ with the continuity correction approximates cor(x,y) -- as $n$ increases, the continuity correction has less and less influence on the result.

The continuity correction is described here: Yates's correction for continuity

Wolfgang
  • 15,542
  • 1
  • 47
  • 74
  • I am also interested as to why the two would give the exact same value. Should I ask this on a different question? – Tal Galili Jan 17 '13 at 22:25
  • 2
    @Tal, for **binary** data, that is, for 2x2 contingency table, _phi_ degenerates into abs(_r_). This can be shown by the fact that their classic formulas can be reduced to the common one shown [here](http://stats.stackexchange.com/questions/26105/what-is-the-difference-between-verifying-how-strong-is-the-relationship-of-varia/26111#26111) – ttnphns Jan 18 '13 at 08:54
  • Does anyone have a textbook or journal reference for this? I'd love to be able to give to my clients so they could cite it... – emudrak Aug 01 '14 at 17:52