4

I understand the logic of standardising raw data that is based on different scales into z scores so that they can compared, for example comparing a score of 75 out of 100 in one test versus 65 out of 120 in another test.

  • Can I run the Spearman's correlation test on the z scores then? (i.e. forget about the raw data).
  • How do I then relate the result back to the raw data?
Jeromy Anglim
  • 42,044
  • 23
  • 146
  • 250
Jasty West
  • 237
  • 6
  • 7
  • 4
    Since Spearman correlation is based on ranks it will be the same on raw variables and on monotonically transformed variables (such as z scores). But what do you mean under "relate back to the raw data"? – ttnphns Aug 07 '11 at 14:11
  • 5
    After eight questions, don't you think it's time to (a) register yourself and (b) mark some of the previous answers as accepted? Don't wear out your welcome. – whuber Aug 07 '11 at 15:11
  • This is an unfortunate comment. I apologise if I have offended anyone. I have greatly benefited from the knowledge shared on this site. I may not ask another question now. Thanks for helping me out. I will always remain grateful. – Jasty West Aug 08 '11 at 10:54
  • 2
    @Jasty, you are welcome, and your questions are welcome (I've found most of your questions quite interesting); whuber is introducing you to the norms of the stack exchange community; an important norm when asking questions is that you mark answers that answer your question as "answered". These actions improve the quality of the site. Most people are assumed to be aware of these norms, thus repeatedly ignoring them may seem rude to some. In your case I assume you are just not familiar with stack exchange sites. See whuber's comment as a simple nudge about how to use the site in the future. – Jeromy Anglim Aug 08 '11 at 11:49
  • No offense was intended. I apologize sincerely for leaving that impression. Perhaps you will appreciate [@chl's reminder](http://stats.stackexchange.com/questions/13853/what-is-the-best-way-of-weighing-cardinal-scores-and-likert-scale-scores-to-creat/13882#13882) a little better and take appropriate action in response. – whuber Sep 06 '11 at 22:20

2 Answers2

3

Spearman's correlation is just Pearson's correlation using ranks (see the Wikipedia page), so any transformation of the data that preserves their ordering (and so gives the same ranks) will give precisely the same value for Spearman's correlation.

Pearson's correlation doen't have that property, but it is the case that it is unaffected by any linear transformation of either variable (or both), say $x^*=ax+b$ and $y^*=cy+d$. This includes the z-score transformation (which I call standardizing).

Karl
  • 5,957
  • 18
  • 34
2

Spearman's correlation on z-scores is the same as it is on raw scores. Here's a little R code to demonstrate the idea:

> # Create two correlated random variables with means and standard deviations
> # that are clearly not z-scores (i.e., not mean = 0, sd = 1):
> set.seed(4444)
> x <- rnorm(100, mean = 100, sd =3)
> y <- x + rnorm(100, mean =50, sd = 2)
> 
> # Create z-score versions of the variables:
> zx <- scale(x)
> zy <- scale(y)
> 
> # Calculate Spearman's correlation on both raw and z-score versions of the
> # variables: 
> # Note that they are the same value.
> cor(x, y, method="spearman")
[1] 0.7756736
> cor(zx, zy, method="spearman")
          [,1]
[1,] 0.7756736
> 
> 
> # Note that this also holds for Pearson's correlation:
> cor(x, y, method="pearson")
[1] 0.8393452
> cor(zx, zy, method="pearson")
          [,1]
[1,] 0.8393452

...

> # Another way of thinking about it is that Pearson's correlation is 
> # equivalent to the standardised beta in a linear regression 
> # involving one variable predicting the other (i.e., a regression
> # coefficient as if the two predictors were z-scores):
> coef(lm(zy~zx))[2]
       zx 
0.8393452 
> coef(lm(zx~zy))[2]
       zy 
0.8393452 
> 
> # In the context of Spearman's correlation, you can think of the 
> # correlation as the 
> # standardised regression coefficient for the variables after converting  
> # each variable to ranks:
> rzx <- rank(zx)
> rzy <- rank(zy)
> 
> coef(lm(rank(rzy)~rzx))[2]
      rzx 
0.7756736 
> coef(lm(rzx~rzy))[2]
      rzy 
0.7756736 
> 
Jeromy Anglim
  • 42,044
  • 23
  • 146
  • 250