5

If the only information you have for a pearson's correlation is the 95% confidence interval, what can you infer from that data?

For example, if you had a correlation coefficient of (0.24;0.78) what would be the best inference to make?

I don't have a strong background in stats so if someone could explain it without lots of equations that would be preferable, thanks!

Jeromy Anglim
  • 42,044
  • 23
  • 146
  • 250
  • possible duplicate of [What, precisely, is a confidence interval?](http://stats.stackexchange.com/questions/6652/what-precisely-is-a-confidence-interval) – Nick Stauner Mar 25 '14 at 00:10
  • Thanks! Unfortunately that post doesn't really answer my question :/ –  Mar 25 '14 at 00:13
  • 3
    If that post doesn't really answer your question, @Mark, it would be helpful to use it to clarify how your Q is distinct from it. I think the answer below is somewhat ambiguous, & I worry that you may take away the wrong lesson. It would be best if you could read that thread thoroughly, & then edit your Q to state what you now understand & what you still need to know. Then you can get the best information. – gung - Reinstate Monica Mar 25 '14 at 00:43
  • I have a basic understanding of what a confidence interval is, but am wondering more about the interpretation of confidence intervals, as apposed to the definition. –  Mar 25 '14 at 01:01
  • We have a few hundred posts discussing the [interpretation of confidence intervals](http://stats.stackexchange.com/search?tab=votes&q=confidence%20interpretation). – whuber Mar 25 '14 at 16:22

2 Answers2

7

All you can say is the sample Pearson's correlation coefficient (r) in contained in the interval from 0.24 to 0.78. You are 95% confident that you will detect a significantly different correlation when testing values outside this interval. What this means is that variable X has some degree of positive linear relationship to variable Y in your sample. (I hesitate to use qualitative descriptors of this "strength" of the relationship because: 1) this is somewhat an outdated way to think of it, 2) what may be a strong correlation in one discipline may be weak in another, and 3) I have no idea of the sample size used to calculate the correlation coefficient.) If this experiment were conducted several independent times, with random sampling over the same population, then 95% (in the long run) will contain the population parameter, rho.

  • Thanks! Exactly what I was looking for :) I thought it was as much, but I've been struck with a question that made it look like you could infer a lot more! –  Mar 25 '14 at 00:16
  • 3
    Confidence has a specific statistical meaning that I would not equate with "certainty". Also, what Pearson's correlation coefficient are you referring to? If you mean the population parameter $\rho$, you're wrong. If you mean the sample statistic $r$ for future samples drawn randomly from the same population, that should be specified. – Nick Stauner Mar 25 '14 at 00:22
  • The question I'm looking at doesn't give any other information. It's asking what can you infer from that small amount of data –  Mar 25 '14 at 00:42
  • 2
    @Nick Stauner - edited for clarity. –  Mar 25 '14 at 00:42
  • Confidence intervals like this are consistent with about $n = 27, r = 0.571$, but don't take this more precisely than it deserves. I assume Fisher z transformation. – Nick Cox Mar 25 '14 at 09:42
  • (1) I can say with 100% confidence (not 95%) that the "sample Pearson's correlation coefficient" is contained in the interval $[0.24, 0.78]$. What is usually of greater interest is the value of the *population* correlation coefficient. (2) Why do you think you need to know the sample size in order to interpret this information? (3) In what specific sense is your description "outdated"? – whuber Mar 25 '14 at 16:21
  • Sample size is useful to judge the usefulness of this interval (is it a wide interval because few subjects sampled, or small interval because there were many subjects sampled). Pearson's are notoriously hard to interpret and it's not clear whether the particular interval means something practically significant. Also, sample size would be helpful generating an effect size. There is no other context given about the sample data, so we also don't know whether the sample is biased in some way. –  Mar 25 '14 at 17:57
  • Giving these correlation values a qualitative label (small, medium, large) is meaningless without context and comparison (which we do not have here). It has the inherent limitation of being non-specific (e.g., we may have wildly different definitions of "small") and even though a correlation may be "weak," it's interpretation could be quite significant. –  Mar 25 '14 at 17:59
1

I make this comment from the perspective of someone who is analytical but who is not an expert in statistics. One of the reasons for doing a linear regression is to get an answer to the question as to whether the values of two variables, x and y, are independent of each other. Alternatively, the data set may contain evidence of some linkage between them. If the confidence interval of "r" CONTAINS zero, that suggests that x and y are unrelated and that the calculated regression equation is of no value. If the confidence interval on "r" DOES NOT CONTAIN zero, there is a reason to believe there is reason to suspect that the value of x is somehow linked to the value of y. In this case, if you are building a statistical or mathematical model that includes both x and y as variables, you might want to include something that represents this linkage...it might improve the predictiveness of the model.

As a caveat, because I am not a statistics expert, I could have this wrong.

Jeff Weiss
  • 11
  • 1
  • This is a good start, but one aspect will give many readers pause: it is the implicit equivalence of "uncorrelated" with "unrelated." There are many important counterexamples: they go under the rubric of "nonlinear associations." – whuber Jan 08 '22 at 17:36