0

Something mesmerizes me in R:

  1. Why are they differences in my correlations depending on the function/package I use and
  2. Which package::function should I choose in what circumstances

Consider the three following examples using the Iris Dataset:

stats::cor.test

cor.test(iris$petal.length, iris$petal.width, method='pearson')
#t = 43.32, df = 148, p-value < 2.2e-16
#cor = 0.9627571

stats::lm

summary(lm(iris$petal.length ~ iris$petal.width))
#(Intercept)       1.09057    0.07294   14.95   <2e-16 ***
#iris$petal.width  2.22589    0.05138   43.32   <2e-16 ***
#Multiple R-squared:  0.9269,   Adjusted R-squared:  0.9264 
#F-statistic:  1877 on 1 and 148 DF,  p-value: < 2.2e-16

lsr::correlate

correlate(iris$petal.length, iris$petal.width, test=TRUE)
# Correlation
#      y.var   
#x.var 0.963***
#p-value
#       y.var
# x.var 0.000

They all give similar values. For instance, the p-value in this example is always the same. The R is also really close ranging from 0.9264 to 0.963 and is in fact identical for stats::cor.test and lsr::correlate.

Wistar
  • 133
  • 5
  • 5
    stats::lm is not giving you the correlation coefficient. It's giving you the R^2. The correlation values from lsr::correlate and stats::cor.test are about 0.9627. Square that and you get 0.9627^2 = 0.9267, which is almost identical to the R^2 value from stats::lm. I expect any differences there are due to rounding. – mkt Mar 30 '18 at 15:58

1 Answers1

3

Turning my comment above into an answer:

stats::lm is NOT giving you the correlation coefficient (i.e. r). It's giving you the R2.

The correlation values from lsr::correlate and stats::cor.test are about 0.9627. Square that and you get 0.96272 = 0.9268, which is almost identical to the R2 value from stats::lm. The small differences there are almost certaintly due to rounding.

mkt
  • 11,770
  • 9
  • 51
  • 125