8

I've been looking for an expression for the expected value and variance of the sample correlation coefficient. Most of the sources I've found say $$ Var(Cor(X, Y)) \approx \frac{(1-\rho^2)^2}{n-1}, $$ as the variance of the sample correlation coefficient, but this assumes that $X$ and $Y$ follow a bivariate normal distribution.

There also seems to be several approaches to series expansion of the function to approximate the moments of the correlation function. However, it has not been clear to me what the assumptions are (e.g., normality), nor which one is the most updated expression.

So, does anyone know of an expression (approximate or not) for the expected value and variance of the correlation coefficient (Pearsons) that does not assume a particular distribution on the random variables?

Update:

Some of my sources:

Assumes bivariate normal distribution:

Published works:

Hotelling (1953): New Light on the Correlation Coefficient and its Transforms. (http://www.jstor.org/stable/2983768)

Fisher (1921): (https://digital.library.adelaide.edu.au/dspace/bitstream/2440/15169/1/14.pdf)

Web sources:

Gerstman (http://www.sjsu.edu/faculty/gerstman/StatPrimer/correlation.pdf)

Stack Exchange (Standard error from correlation coefficient)

Wikipedia (https://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient#Inference)

Holland (http://strata.uga.edu/6370/lecturenotes/correlation.html)

Doesn't state the assumption of bivariate normality, but it should be assumed:

Stack Overflow (https://stackoverflow.com/questions/16097453/how-to-compute-p-value-and-standard-error-from-correlation-analysis-of-rs-cor)

I don't understand this one, unfortunately, but it seems it would be a fruitful approach:

Hawkings (1989) - Using U Statistics to Derive the Asymptotic Distributino of Fischer's Z Statistic (http://www.jstor.org/stable/2685369)

Anvit
  • 107
  • 5
Tommy L
  • 1,396
  • 9
  • 17
  • 1
    "Most of the sources" Can you list them? I'm also interested by those results, even assuming bivariate normal distribution. Thx. – mic Feb 15 '16 at 17:46
  • 1
    I've update my question with a list of some of the sources I found. I would be very grateful if you would add an answer when/if you get any closer to an answer ;-) – Tommy L Feb 16 '16 at 08:15
  • 4
    The large sample variance of a correlation coefficient is $\frac{(1 - \rho^2)^2}{n-1}$ (or just $n$ in the denominator). You are missing a square there. What to put in the denominator is debatable, but since this is a large-sample approximation anyway, it is kind of irrelevant. – Wolfgang Feb 16 '16 at 09:11
  • 1
    @Wolfgang: But that expression assumes the variables are bivariate normal, doesn't it? – Tommy L Feb 16 '16 at 09:13
  • 1
    Yes, this was just a small correction. I'll post an answer to your question shortly. I cannot give you one equation that covers the non-normal case, but I can give you several references that do go into this. – Wolfgang Feb 16 '16 at 09:15
  • 2
    @Wolfgang: Do you have a reference for that? All I find is that the $SE=\sqrt{\frac{1-\rho^2}{n-2}}$, which when squared gives the variance as I wrote in my question. – Tommy L Feb 16 '16 at 09:15
  • 1
    See, for example, Hotelling (1953), p. 212, the equation for $\sigma_r^2$ (if you cut off every terms after $1/n$). – Wolfgang Feb 16 '16 at 09:24
  • 1
    Ok, but in Hotelling's paper, it is written (on page 195) that the variables are assumed coming from a bivariate normal distribution. Is this a weak assumption that can be ignored gracefully, or have I missed something? – Tommy L Feb 16 '16 at 13:04
  • Yes, that equation applies to the bivariate normal case. I was making a small correction, not giving an equation for the general case. For that, you will have to dig into the references I posted. – Wolfgang Feb 16 '16 at 18:23
  • You can also find $SE_r =\frac{1-r^2}{\sqrt{n-2}}$ in my answer (https://stats.stackexchange.com/a/375616/178923), Elston 1975 p. 136 (https://www.researchgate.net/publication/267137971_On_the_Correlation_Between_Correlations), Fisher 1921 p. 222 (https://digital.library.adelaide.edu.au/dspace/handle/2440/15169). – Jean Paul Aug 28 '20 at 14:39

1 Answers1

6

I cannot give you one expression, but here are several articles that cover some non-normal cases:

Browne, M. W., & Shapiro, A. (1986). The asymptotic covariance matrix of sample correlation coefficients under general conditions. Linear Algebra and its Applications, 82, 169-176.

Gayen, A. K. (1951). The frequency distribution of the product-moment correlation coefficient in random samples of any size drawn from non-normal universes. Biometrika, 38, 219-247.

Kowalski, C. (1972). On the effects of non-normality on the distribution of the sample product-moment correlation coefficient. Applied Statistics, 21, 1-12.

Subrahmaniam, K., & Gajjar, A. V. (1980). Robustness to nonnormality of some transformations of the sample correlation coefficient. Journal of Multivariate Analysis, 10, 60-77.

Yuan, K.-H., & Bentler, P. M. (2000). Inferences on correlation coefficients in some classes of nonnormal distributions. Journal of Multivariate Analysis, 72, 230-248.

Wolfgang
  • 15,542
  • 1
  • 47
  • 74