Why do I need to rounding x and y values to the nearest 0.5 when manually calculating correlation?

Question

I'm trying to calculate correlation using a formula in Statistics 4th Edition by Freedman:

r = average of (x in standard units) * (y in standard units)

If I try this out ...

x = 1:7
y = c(6,7,5,4,3,1,2)

x.z = scale(x)
y.z = scale(y)

prod = x.z * y.z
mean(prod)
[1] -0.7959184

However, if I use the builtin cor I get a different answer:

cor(x, y)
[1] -0.9285714

Looking through the worked examples in the book, the standard values for x and y seem to be rounded to the nearest 0.5, so I round my values and I get the expected answer:

x.z.round = round(x.z/0.5)*0.5 
y.z.round = round(y.z/0.5)*0.5 

prod.round = x.z.round * y.z.round
mean(prod.round)
[1] -0.9285714

Why do the x and y scaled values seemingly need to be rounded to the nearest 0.5?

The answer is that `cor` does not implement the correlation coefficient as defined in your reference textbook. It's important to consult its documentation (type `?cor`) and compare its definition to that your book is using. — whuber, Dec 20 '18 at 14:32

score 5 · Accepted Answer · answered Dec 20 '18 at 10:31

5

You made a mistake, the formula for Pearson's correlation coefficient (using the standardized formula) is divided by n-1, not n. So if you use sum(prod)/6 you get the correct result.

> sum(prod)/6
[1] -0.9285714

answered Dec 20 '18 at 10:31

user2974951

5,700
2
14
27

Incredible! So was it just by chance that rounding came up with the right answer? – Chris Snow Dec 20 '18 at 10:43
@ChrisSnow Trying your code for a different sample produces incorrect results, so this may very well be just a coincidende. – user2974951 Dec 20 '18 at 10:53
1

**The OP did not make a mistake.** The textbook they cite *always* divides by $n,$ not $n-1.$ See https://stats.stackexchange.com/a/3932/919 for a fuller account of this. Thus, the correct answer is to multiply the result of `cor` by the square root of $n/(n-1)$ rather than to accept what the software tells you! – whuber Dec 20 '18 at 14:32

Why do I need to rounding x and y values to the nearest 0.5 when manually calculating correlation?

1 Answers1