1

If one knows the correlation coefficient for two variables, and also the number of observations, but does not have access to the raw data, how does one calculate a p-value?

[EDIT:] Please understand that I am asking about how this is done. Not requesting the name of a software function that would do it for me, as some have assumed. An ideal answer to this question would be a procedure that one could theoretically follow, using a pen and paper if necessary, to get from a correlation coefficient and a number of observations to an accurate p-value (regardless of how long that procedure would actually take if carried out with pen-and-paper). [EDIT ENDS]

The following answer explains how to calculate the t score for a correlation, but relies on an Excel function called tdist to derive a p-value from that t score (which is unhelpful if one does not happen to be using Excel):

https://stats.stackexchange.com/a/120235/248482

  • @whuber I don't feel that it was fair to close this question for that reason. As my edit (made five days ago) makes clear, I'm not asking about programming but about the mathematical question of how to calculate a p-value from a correlation coefficient given the number of observations. The only question on this site that addresses that problem (a) confusingly phrases it in relation to specific data and (b) has only a single answer, which does not answer the general mathematical question because it makes sense only if one is using a specific software package, i.e. Microsoft Excel. – Westcroft_to_Apse Jun 13 '19 at 11:39
  • No problem: just edit your question to make the intentions clear and the community can vote to re-open it. – whuber Jun 13 '19 at 12:30
  • @whuber Thanks, but... I thought I already did? – Westcroft_to_Apse Jun 13 '19 at 13:56
  • You need to edit the question: a comment does not suffice, because not everyone reads the comments. – whuber Jun 13 '19 at 14:07
  • @whuber As explained in my comment, I _did_ edit the question (five days ago). Your explanation (above) suggests that my question 'EITHER... is not about statistics, machine learning, data analysis, data mining, or data visualization' (clearly not the case, as it's about p values) 'OR it focuses on programming, debugging, or performing routine operations within a statistical computing platform' (also not the case, as no statistical computing platform is mentioned except Excel, and the latter is mentioned only in context of a complaint that an existing answer requires Excel, which I do not). – Westcroft_to_Apse Jun 13 '19 at 14:59
  • I'm sorry, but the edited question--the one we're looking at this moment--does not reflect the points you have been making in your comments. Not only does it appear to seek just some software to perform the calculations, the existing reply unfortunately reinforces that impression. You can't do much about the latter, but you can easily edit the question to address the former. – whuber Jun 13 '19 at 16:31
  • @whuber Which words in my question “appear to just seek some software to perform the calculations”? – Westcroft_to_Apse Jun 13 '19 at 19:21
  • Every bit of it, especially because you reference an answer that already supplies a formula and an explanation. If you want something else, then you need to state explicitly what you're looking for. – whuber Jun 13 '19 at 19:30
  • @whuber The only reason I reference that answer is to explain why it is not adequate! It does not provide a formula or an explanation for how to calculate p values, it only provides a formula and an explanation for calculating t - and then says (and I quote) 'And then you use the tdist() function in Excel'. This is the problem. The explanation there is incomplete, and relies on a specific software package (which is exactly what we are _not_ supposed to do on Cross Validated). – Westcroft_to_Apse Jun 13 '19 at 19:56
  • 1
    Your edits bumped this thread into the reopen queue, but it was voted to be left closed. The required formulas are in the linked thread. You won't be able to calculate the t-distribution with pencil and paper, but it can be calculated by basic functions in a wide range of software (even Excel). You could also look up values in a t-table in the back of any stats textbook. – gung - Reinstate Monica Jun 13 '19 at 20:33
  • @gung Okay so is there a rule now that if something's in a stats textbook it can't be covered on Cross Validated? Also, the question in the linked thread is phrased as follows: 'I would like to understand how people add the P value on a figure for means (Y axis) by age, volume or any other variable (x axis). How did they calculate the P value here? Please check the following figure'. How on earth is that better than my question, which asks directly about the underlying math? – Westcroft_to_Apse Jun 13 '19 at 20:43
  • I'm afraid I still cannot find anything in your current post that "asks directly about the underlying math." It still references Excel. For the underlying math, see https://stats.stackexchange.com/questions/394978 for a numerical method and [Student t distribution cdf](https://stats.stackexchange.com/search?q=student+t+distribution+cdf) for mathematical formulations (as integrals). – whuber Jun 13 '19 at 21:27
  • @whuber The only reference to Excel is when I say that I _don't_ want an answer that references Excel!!!!!!!!!!! (These are my words, my _actual words_: 'The following answer... relies on an Excel function... which is unhelpful if one does not happen to be using Excel'). As for my asking about math, here is my question (again, in _my actual words_): 'If one knows the correlation coefficient for two variables, and also the number of observations, but does not have access to the raw data, how does one calculate a p-value?' There's no reference to software. I am asking how the calculation is done – Westcroft_to_Apse Jun 13 '19 at 21:37
  • I'm afraid I can't help you any further. Complaining about the status of your post isn't going to distinguish it from the apparent duplicates; only editing the post can do that. – whuber Jun 13 '19 at 21:39
  • @whuber Editing it is not going to help if you're going to respond to what you think I write rather than what I actually write. But perhaps you can give me some advice. How would _you_ write the question 'How does one calculate a p-value from a correlation coefficient given the number of observations?' in such a way as not to have people such as yourself assume that it is somehow secretly a question about software (rather than about calculating a p-value from a correlation coefficient given the number of observations)? – Westcroft_to_Apse Jun 13 '19 at 21:42
  • 1
    I'm unsure, because you still haven't disclosed the nature of the answer you seek: do you want the name of a software function? A mathematical formula? A numerical algorithm? At this point you have links to all three kinds of answers and I'm at a loss to know what else you might have in mind. You seem interested only in debating the status of your question rather than revealing your intentions. I have done what I can to help and am sorry it was unsuccessful. – whuber Jun 13 '19 at 21:45
  • @whuber The only reason I've linked to those answers is to explain why they are not helpful answers to my question as asked. On the other StackExchange sites, that's good practice. I have no idea why it isn't considered good practice on this one. I have not mentioned software except to say that the only thing on this site that currently might be mistaken for answer to my question is not an answer because it relies on Excel. I would have thought that a formula was the most efficient answer to my question, but if you'd prefer to give an algorithm, that's also good. – Westcroft_to_Apse Jun 13 '19 at 21:51
  • I linked to both: please take a look. For your convenience, [here's the algorithm](https://stats.stackexchange.com/questions/394978) and [here is one formula](https://stats.stackexchange.com/questions/46808), [here is another](https://stats.stackexchange.com/questions/57231), [here is an approximation](https://stats.stackexchange.com/questions/246375/), and [here are references to a generalization](https://stats.stackexchange.com/questions/104296). – whuber Jun 13 '19 at 22:01
  • @whuber I have now read those. None of them is an answer to the question of how to calculate a p-value from a correlation coefficient given the number of observations. Perhaps, given time, one could piece together an answer to that question from the knowledge contained in them. But what exactly is your objection to having the whole of that answer in one place? – Westcroft_to_Apse Jun 13 '19 at 22:08
  • 1
    The whole of that answer *is* in one place: the thread where you began (as kindly pointed out earlier by @gung). I haven't objected to anything, by the way: I have only been asking you to formulate an answerable question. If you continue only to comment and not to attempt that, it will be difficult not to conclude that you're just trolling us and aren't really interested in asking or getting answers to any substantial question. That's why I'm bowing out of this dialog. – whuber Jun 13 '19 at 22:16
  • @whuber It _isn't_ in the original thread, because that answer only gets as far as calculating t and then says 'We then use the tdist() function to find the associated p'. What I'm asking is, how would you find the associated p _without relying on Excel to do the last bit of the work_. (It's a shame that you can only see this as trolling, rather than respond to what I'm writing in good faith.) – Westcroft_to_Apse Jun 13 '19 at 22:22

1 Answers1

2

In R, tdist() fonction is pt(q,df).

Abdoul Haki
  • 763
  • 3
  • 14