3

I'm trying to calculate correlation using a formula in Statistics 4th Edition by Freedman:

r = average of (x in standard units) * (y in standard units)

If I try this out ...

x = 1:7
y = c(6,7,5,4,3,1,2)

x.z = scale(x)
y.z = scale(y)

prod = x.z * y.z
mean(prod)
[1] -0.7959184

However, if I use the builtin cor I get a different answer:

cor(x, y)
[1] -0.9285714

Looking through the worked examples in the book, the standard values for x and y seem to be rounded to the nearest 0.5, so I round my values and I get the expected answer:

x.z.round = round(x.z/0.5)*0.5 
y.z.round = round(y.z/0.5)*0.5 

prod.round = x.z.round * y.z.round
mean(prod.round)
[1] -0.9285714

Why do the x and y scaled values seemingly need to be rounded to the nearest 0.5?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Chris Snow
  • 619
  • 6
  • 13
  • 2
    The answer is that `cor` does not implement the correlation coefficient as defined in your reference textbook. It's important to consult its documentation (type `?cor`) and compare its definition to that your book is using. – whuber Dec 20 '18 at 14:32

1 Answers1

5

You made a mistake, the formula for Pearson's correlation coefficient (using the standardized formula) is divided by n-1, not n. So if you use sum(prod)/6 you get the correct result.

> sum(prod)/6
[1] -0.9285714
user2974951
  • 5,700
  • 2
  • 14
  • 27
  • Incredible! So was it just by chance that rounding came up with the right answer? – Chris Snow Dec 20 '18 at 10:43
  • @ChrisSnow Trying your code for a different sample produces incorrect results, so this may very well be just a coincidende. – user2974951 Dec 20 '18 at 10:53
  • 1
    **The OP did not make a mistake.** The textbook they cite *always* divides by $n,$ not $n-1.$ See https://stats.stackexchange.com/a/3932/919 for a fuller account of this. Thus, the correct answer is to multiply the result of `cor` by the square root of $n/(n-1)$ rather than to accept what the software tells you! – whuber Dec 20 '18 at 14:32