0

When we calculate the correlation between two random variables, we seem confident that the estimated correlation is 100% accurate since I never heard of estimation error being a problem for correlation. We take its value as is without question.

But do correlation estimates have increased error for small sample data? Why, and if so, how to demonstrate that it is off?

develarist
  • 3,009
  • 8
  • 31
  • 1
    Please clarify the setting: if these "variables" are random variables, stipulated mathematically, then error is not an issue. If they are instead observations of random variables then of course almost any statistic computed from them is a random variable, too--and in the case of the Pearson correlation coefficient, any research will reward you with the discovery of a vast literature going back to the 19th century concerning its estimation and assessing the error of that estimate. – whuber Sep 29 '20 at 21:13
  • Could you lead me in the direction of sources specifically analyzing the error in correlation caused by small sample data? I edited – develarist Sep 29 '20 at 21:19
  • 2
    This search assumes no knowledge: https://www.google.com/search?q=pearson+correlation+hypothesis+test. This one assumes a little bit more: https://www.google.com/search?q=fisher+z+transformation; any authoritative account will link back to the classic literature of the early 20th century. – whuber Sep 29 '20 at 21:21
  • 2
    You seem to be saying that the standard errors on correlation values are zero. You know this is false. Could you please clarify? – Dave Sep 29 '20 at 22:06
  • 2
    Naturally *sample* correlation is subject to random variation, as with essentially any other statistic. The sample values are changing with each new sample, and the value of some statistic changes with them. The standard error of sampke correlation is a function of sample size (again, as with essentially any other statistic). – Glen_b Sep 30 '20 at 04:46
  • if the standard error of sample correlation is a functiin of sample size, a first answer could be forming for the question – develarist Sep 30 '20 at 04:51
  • The following says sample correlation estimates stabilize for sample size 250. Other sources say 25 observations is already enough, which is it https://towardsdatascience.com/sample-size-and-correlation-eb1581227ce2 – develarist Sep 30 '20 at 05:03
  • When you say that we are confident in our calculation being 100% accurate, are you talking about the random variables (population)? – Dave Sep 30 '20 at 14:20
  • no, in the sample correlation estimate – develarist Sep 30 '20 at 19:45
  • What did you read that has you thinking we perfectly pin down the correlation? – Dave Sep 30 '20 at 20:24
  • because estimation error of correlation is never discussed – develarist Sep 30 '20 at 21:11
  • 1
    re "never discussed:" Maybe it's time to search our site? https://stats.stackexchange.com/questions/226380 – whuber Oct 01 '20 at 13:14
  • `cor.test` in R gives a confidence interval for the correlation coefficient, too. – Dave Oct 01 '20 at 13:48
  • @whuber I guess the answer lies in $n$ being in the denominator of $Var(r)$ doesn't it – develarist Oct 01 '20 at 14:32
  • 1
    I don't totally follow what you mean by $n$ being in the denominator, as $n$ is in the denominator of the sample mean, too, and you know that means aren't estimated with certainty. – Dave Oct 01 '20 at 14:34
  • as $n$ goes up $se(r)$ goes down – develarist Oct 01 '20 at 14:38

0 Answers0