Pearson's or Spearman's correlation with non-normal data

Question

I get this question frequently enough in my statistics consulting work, that I thought I'd post it here. I have an answer, which is posted below, but I was keen to hear what others have to say.

Question: If you have two variables that are not normally distributed, should you use Spearman's rho for the correlation?

Why not calculate and report **both** (Pearson's r *and* Spearman's ρ)? Their difference (or lack thereof) will provide additional information. — , Sep 09 '15 at 07:51
A question comparing the distributional assumptions made when we test for significance a simple regression coefficient beta and when we test Pearson correlation coefficient (numerically eual to the beta) http://stats.stackexchange.com/q/181043/3277. — ttnphns, Nov 29 '15 at 09:36
Pearson's correlation is linear, Spearman's is monotonic, so they're not normally for the same purpose. The Pearson coefficient doesn't need you to assume normality. There's a test for it that does assume normality, but you don't have only that option. — Glen_b, Feb 24 '20 at 08:03

Rob Hyndman · Accepted Answer · 2010-10-19T07:48:02.297

104

Pearson's correlation is a measure of the linear relationship between two continuous random variables. It does not assume normality although it does assume finite variances and finite covariance. When the variables are bivariate normal, Pearson's correlation provides a complete description of the association.

Spearman's correlation applies to ranks and so provides a measure of a monotonic relationship between two continuous random variables. It is also useful with ordinal data and is robust to outliers (unlike Pearson's correlation).

The distribution of either correlation coefficient will depend on the underlying distribution, although both are asymptotically normal because of the central limit theorem.

edited Oct 19 '10 at 07:48

answered Oct 19 '10 at 01:53

Rob Hyndman

51,928
23
126
178

15

Pearson's $\rho$ does not assume normality, but is only an exhaustive measure of association if the joint distribution is multivariate normal. Given the confusion this distinction elicits, you might want to add it to your answer. – user603 Oct 19 '10 at 07:42
@kwak. Good point. I'll update the answer. – Rob Hyndman Oct 19 '10 at 07:45
5

Is there a source that can be quoted to support the above statement (Person's r does not assume normality)? We're having the same argument in our department at the moment. – Feb 01 '12 at 14:52
@RobHyndman In the field of financial time series (for example when trying to learn about correlations between stock returns), would you recommend Pearson correlation or rank based correlations? Wikipedia is pretty strongly against Pearson but their source is dubious. – Jase Dec 27 '12 at 05:57
8

*"When the variables are bivariate normal, Pearson's correlation provides a complete description of the association."* And when the variables are NOT bivariate normal, how useful is Pearson's correlation? – landroni Sep 16 '14 at 10:07
Here: http://www.statisticssolutions.com/correlation-pearson-kendall-spearman/ they say that "For the Pearson r correlation, both variables should be normally distributed. Other assumptions include linearity and homoscedasticity" – skan Mar 09 '15 at 02:15
5

This answer seems rather indirect. "When the variables are bivariate normal ..." And when not? This kind of explanation is why I never get statistics. "Rob, how do you like my new dress?" "The dark color emphasizes your light skin." "Sure, Rob, but do you *like* how it emphasisez my skin?" "Light skin is considered beautiful in many cultures." "I know, Rob, but do *you* like it?" "I think the dress is beautiful." "I think so, too, Rob, but is it beautiful *on me*?" "You always look beautiful to me, honey." *sigh* – Aug 12 '15 at 11:19
2

If you read the two sentences before that, you will find the answer. – Rob Hyndman Aug 12 '15 at 12:50
Although the asymptotic distributions of the correlations are normal, the variances of the those normal distributions depend on the unknown population parameters. In the sense of inference, we do require bivariate normality for Pearson's correlation. – Randy Lai Sep 26 '16 at 14:57
3

No, we don't. It's quite possible to do inference for Pearson's correlation without assuming bivariate normality, in at least four different ways. (i) use asymptotic results -- already mentioned above; (ii) make some other parametric distributional assumption and derive or simulate the null distribution of the test statistic; (iii) use a permutation test; (iv) use a bootstrap test. There are probably other approaches – Glen_b Oct 06 '19 at 03:46
2

These answers all show what's is wrong with today's statistics education. The CLT does NOT guarantee your data will converge to normal. In fact, in almost all cases it will NOT. Every answer here is circular because it assumes normality is something real-world data tends towards, which is does NOT. Most real-world data will be fait-tailed, meaning it's moments are extremely ill-defined, or don't exist period. Convergence is either slow or non-existent. Pearson's correlation is used out of convenience, not because it is a robust measure, which it is NOT. – Cybernetic Feb 17 '20 at 15:11
This answer seems dangerous to me since it may encourage unaware statistic users to think no normality is needed to make sense of Pearson coefficient. Just go back to basic maths: If the both random variables behind your data have finite variance, then it follows from the law of large numbers that the Person coefficient will converge to the true correlation as the number of data observations grows to infinity. But what is the error made for a finite data set? This depends on the higher moments of your random variables. – Student Jan 12 '21 at 10:37
[Part II] IF you have normality, then you have an explicit confidence interval leading to a proper statistical test, p-values, etc. But otherwise you DO NOT have, a priori. Try the experiment with two i.i.d observations of a Student variable of parameter 3, it's instructive. It's not because you can compute something without normality assumption that this output tells you something on your data! – Student Jan 12 '21 at 10:40

onestop · Answer 2 · 2010-10-21T05:14:09.703

59

Don't forget Kendall's tau! Roger Newson has argued for the superiority of Kendall's τ_a over Spearman's correlation r_S as a rank-based measure of correlation in a paper whose full text is now freely available online:

Newson R. Parameters behind "nonparametric" statistics: Kendall's tau,Somers' D and median differences. Stata Journal 2002; 2(1):45-64.

He references (on p47) Kendall & Gibbons (1990) as arguing that "...confidence intervals for Spearman’s r_S are less reliable and less interpretable than confidence intervals for Kendall’s τ-parameters, but the sample Spearman’s r_S is much more easily calculated without a computer" (which is no longer of much importance of course). Unfortunately I don't have easy access to a copy of their book:

Kendall, M. G. and J. D. Gibbons. 1990. Rank Correlation Methods. 5th ed. London: Griffin.

edited Oct 21 '10 at 05:14

answered Oct 19 '10 at 07:07

onestop

16,816
2
53
83

4

I'm also a big fan of Kendall's tau. Pearson is far too sensitive to influential points/outliers for my taste, and while Spearman doesn't suffer from this problem, I personally find Kendall easier to understand, interpret and explain than Spearman. Of course, your mileage may vary. – Stephan Kolassa Oct 19 '10 at 07:44
1

My recollection from experience is that Kendall's tau still runs a lot slower (in R) than Spearman's. This can be important if your dataset is large. – wordsforthewise May 25 '18 at 18:07

Jeromy Anglim · Answer 3 · 2010-10-19T07:36:34.143

From an applied perspective, I am more concerned with choosing an approach that summarises the relationship between two variables in a way that aligns with my research question. I think that determining a method for getting accurate standard errors and p-values is a question that should come second. Even if you chose not to rely on asymptotics, there's always the option to bootstrap or change distributional assumptions.

As a general rule, I prefer Pearson's correlation because (a) it generally aligns more with my theoretical interests; (b) it enables more direct comparability of findings across studies, because most studies in my area report Pearson's correlation; and (c) in many settings there is minimal difference between Pearson and Spearman correlation coefficients.

However, there are situations where I think Pearson's correlation on raw variables is misleading.

Outliers: Outliers can have great influence on Pearson's correlations. Many outliers in applied settings reflect measurement failures or other factors that the model is not intended to generalise to. One option is to remove such outliers. Univariate outliers do not exist with Spearman's rho because everything is converted to ranks. Thus, Spearman is more robust.
Highly skewed variables: When correlating skewed variables, particularly highly skewed variables, a log or some other transformation often makes the underlying relationship between the two variables clearer (e.g., brain size by body weight of animals). In such settings it may be that the raw metric is not the most meaningful metric anyway. Spearman's rho has a similar effect to transformation by converting both variables to ranks. From this perspective, Spearman's rho can be seen as a quick-and-dirty approach (or more positively, it is less subjective) whereby you don't have to think about optimal transformations.

In both cases above, I would advise researchers to either consider adjustment strategies (e.g., transformations, outlier removal/adjustment) before applying Pearson's correlation or use Spearman's rho.

The problem with transformation is that, in general, it also transforms the errors associated to each point, and thus the weight. And it doesn't solve the outlier's problem. — skan, Mar 09 '15 at 02:09
The previous comment is puzzling. Transformation often tames outliers. Also, what to think about errors depends on the scale you choose for analysis. If a logarithmic scale makes sense, for example, additive errors on that scale often make sense too. — Nick Cox, Mar 15 '20 at 10:19

ars · Answer 4 · 2010-10-21T02:05:03.130

13

Updated

The question asks us to choose between Pearson's and Spearman's method when normality is questioned. Restricted to this concern, I think the following paper should inform anyone's decision:

On the Effects of Non-Normality on the Distribution of the Sample Product-Moment Correlation Coefficient (Kowalski, 1975)

It's quite nice and provides a survey of the considerable literature, spanning decades, on this topic -- starting from Pearson's "mutilated and distorted surfaces" and robustness of distribution of $r$. At least part of the contradictory nature of the "facts" is that much of this work was done before the advent of computing power -- which complicated things because the type of non-normality had to be considered and was hard to examine without simulations.

Kowalski's analysis concludes that the distribution of $r$ is not robust in the presence of non-normality and recommends alternative procedures. The entire paper is quite informative and recommended reading, but skip to the very short conclusion at the end of the paper for a summary.

If asked to choose between one of Spearman and Pearson when normality is violated, the distribution free alternative is worth advocating, i.e. Spearman's method.

Previously ..

Spearman's correlation is a rank based correlation measure; it's non-parametric and does not rest upon an assumption of normality.

The sampling distribution for Pearson's correlation does assume normality; in particular this means that although you can compute it, conclusions based on significance testing may not be sound.

As Rob points out in the comments, with large sample this is not an issue. With small samples though, where normality is violated, Spearman's correlation should be preferred.

Update Mulling over the comments and the answers, it seems to me that this boils down to the usual non-parametric vs. parametric tests debate. Much of the literature, e.g. in biostatistics, doesn't deal with large samples. I'm generally not cavalier with relying on asymptotics. Perhaps it's justified in this case, but that's not readily apparent to me.

edited Oct 21 '10 at 02:05

answered Oct 19 '10 at 01:27

ars

12,160
1
36
54

2

No. Pearson's correlation does NOT assume normality. It is an estimate of the correlation between any two continuous random variables and is a consistent estimator under relatively general conditions. Even tests based on Pearson's correlation do not require normality if the samples are large enough because of the CLT. – Rob Hyndman Oct 19 '10 at 01:46
2

I am under the impression that Pearson is defined as long as the underlying distributions have finite variances and covariances. So, normality is *not* required. If the underlying distributions are not normal then the test-statistic may have a different distribution but that is a secondary issue and not relevant to the question at hand. Is that not so? – Oct 19 '10 at 01:47
@Rob, @Srikant: True, I was thinking of significance testing. – ars Oct 19 '10 at 01:59
@Srikant: I'm not sure it's a "secondary issue". You can compute anything after all -- it's the inference that matters. @Rob: your "if" qualifier is key here -- it seems to me that's central to this question. We can justify a whole lot with asymptotic hand waving; exceptions matter. – ars Oct 19 '10 at 05:54
@ars,@Srikant. Even with small samples, you can still do inference on correlations, but not using the asymptotic normality result. – Rob Hyndman Oct 19 '10 at 11:41
@Rob: Sure, but it seems this is where one should advocate Spearman's method over Pearson's. For example suppose small samples where X is normal but Y isn't -- you can compare the two on even terms with ranking methods such as Spearman's. Using Pearson's requires more work, for example, finding an appropriate transformation. – ars Oct 19 '10 at 14:18
@ars. You can just use Monte Carlo methods or a bootstrap. Not much work in that, just computation. – Rob Hyndman Oct 19 '10 at 22:24
2

@Rob: Yes, we can always come up with workarounds to make things work out roughly the same. Simply to avoid Spearman's method -- which most non-statisticians can handle with a standard command. I guess my advice remains to use Spearman's method for small samples where normality is questionable. Not sure if that's in dispute here or not. – ars Oct 19 '10 at 23:43
1

@ars. I would use Spearman's if I was interested in monotonic rather than linear association, or if there were outliers or high levels of skewness. I would use Pearson's for linear relationships provided there are no outliers. I don't think the sample size is relevant in making the choice. – Rob Hyndman Oct 20 '10 at 00:32
3

@Rob: OK, thanks for the discussion. I agree with the first part, but doubt the last, and would include that size only plays a role because normal asymptotics don't apply. For example, Kowalski 1972 has a pretty good survey of the history around this, and concludes that the Pearson's correlation is not as robust as thought. See: http://www.jstor.org/pss/2346598 – ars Oct 20 '10 at 01:00

score 3 · Answer 5 · answered Feb 24 '20 at 03:49

I think these figures (of Gross-Error Sensitivity and Asymptotic Variance) and quotation from the below paper will make it a bit clear:

"The Kendall correlation measure is more robust and slightly more efficient than Spearman’s rank correlation, making it the preferable estimator from both perspectives."

Source: Croux, C. and Dehon, C. (2010). Influence functions of the Spearman and Kendall correlation measures. Statistical Methods and Applications, 19, 497-515.

score 1 · Answer 6 · answered Jun 07 '21 at 17:29

Even though this is an age old question, I would like to contribute the (cool) observation that Pearson's $\rho$ is nothing but the slope of the trend line between $Y$ and $X$ after means have been removed and the scales are normalized for $\sigma_Y$, i.e. after removing means and normalizing for $\sigma_Y$, Pearson's $r$ is the least-squares solution of $\hat Y=X\hat\beta$ where $\hat Y = Y / \sigma_Y$.

This leads to a quite easy decision rule between the two: Plot $Y$ over the $X$ (simple scatter plot) and add a trend line. If the trend looks out of place then don't use Pearson's $\rho$. Bonus: you get to visualize your data, which is never a bad thing.

If you aren't comfortable with Pearson's $\rho$, then Spearman's rank makes this a bit better because it rescales both the x-axis and the y-axis in a non-linear way (rank encoding) and then fits the trend line in the embedded (transformed) space. In practice, this seems to work well and it does improve robustness towards outliers or skew as others have pointed out.

In theory, I do think Spearman's rank is a bit funny though because rank encoding is a transformation that maps real numbers onto a discrete sequence of numbers. Fitting a linear regression onto discrete numbers is non-sense (they are discrete), so what is happening is that we re-embedd the sequence into the real numbers again using their natural embedding and fit a regression in that space instead. Seems to work well enough in practice, but I do find it funny.

Instead of using Spearman's rank, it may be better to just commit to the rank encoding and go with Kendall's $\tau$ instead; even though we lose the relationship with Pearson's $\rho$.

Pearson's $\rho$ from Least-Squares

We can start with the desire to fit a linear regression model $Y=X\hat\beta + b$ on our observations using least-squares. Here $X$ is a vector of observations and $Y$ is another vector of matching observations. If we are happy to make the assumption that $X$ and $Y$ had their mean removed ($\mu_X=\mu_Y=0$, easy enough to do) then we can reduce the model to $Y=X\hat\beta$. For this, there exists a closed-form solution $\hat\beta=(X^TX)^{-1}X^TY$.

Under the vector notation $\text{Cov}(X, Y) = E[XY]-E[X]E[Y] = E[XY] = X^TY$ - we removed the means - and similarly $\sigma_X = \text{Var}(X, X) = \text{Cov}(X, X) = X^TX$. If we now rewrite $\hat\beta$ in terms of $\text{Cov}$ and $\sigma_X$ we get $\hat\beta = \sigma_X^{-1}\text{Cov}(X,Y) = \frac{\text{Cov}(X,Y)}{\sigma_X}$.

Plugging this back into the model and normalizing for $\sigma_Y$ resuls in $Y/\sigma_Y = \frac{\text{Cov}(X,Y)}{\sigma_X\sigma_Y}X$, where the slope is exactly Pearson's $\rho$. $Y/\sigma_Y$ is the expected rescaling of $Y$, since we are interested in a variance-normalized coefficient.

Fascinating, I never realized this connection. How does Cov(X,Y) = E[XY] - E[X]E[Y]? — saeranv, Jul 22 '21 at 05:32
@saeranv It is one of the ways to define covariance (or follows quickly from your chosen definition): https://en.wikipedia.org/wiki/Covariance#Definition — FirefoxMetzger, Jul 22 '21 at 08:12
thanks, so obvious, I should have worked it out for myself! I have another thought/question: I am trying to think of an intuitive reason for why $Y$ is equal to $Cov(X,Y)$ normalized by $\sigma_X$ but not $\sigma_Y$? Would it be accurate to say that regression is equal to the $X$ feature vectors shrunk by an "angle factor" between $X$ and $Y$ (since $X \cdot Y = cos(theta_{XY})$, and then scaled by the standard deviation of $Y$? — saeranv, Jul 23 '21 at 19:58

Pearson's or Spearman's correlation with non-normal data

6 Answers6

Pearson's $\rho$ from Least-Squares

Linked

Related