7

If my understanding is correct, then

  1. the test on a regression slope in a simple bivariate regression - i.e. the test of $\mathcal{H}_0$: $b = 0$ in $Y' = a + bX$ and
  2. the test of a correlation, i.e. $\mathcal{H}_0$ : $\rho=0$

appear to involve different assumptions. The first assumes normality of errors (conditional distribution of Y given X) while the second is reported to assume bivariate normality of X and Y. Yet the tests produce identical p-values in every case I've ever seen and several trustworthy sources (e.g. David Howell's Psych Stats Text, van Belle et al's Biostats text) assert that these are the same test.

Now, bivariate normality (as far as I can deduce) implies that the conditional distribution of Y given X is Normal with a constant variance (equal to $var(Y)\times(1-\rho^2 )$), which is the stated assumption in the regression slope test. So is it the case that bivariate normality is not truly required in the test of the correlation - that the more narrow assumption regarding the conditional distribution is the only required assumption?

Anthony Martin
  • 1,109
  • 3
  • 11
  • 26
J Taylor
  • 321
  • 3
  • 9
  • 1
    Common (F or t) testing in simple linear regression `Y ~ X` coefficient assumes error normality and homoscedasticity for Y. But linear correlation testing implies that same thing in opposite regression `X ~ Y` too. But that, if I'm not mistaken, holds only when the distribution is bivariate normal. So, when you are testing $r$ by the same approach as you do in regression, the normality is the assumption. (But you could do $r$ testing some other alternative ways, not requiring the normality: permutation/montecarlo/bootstrap.) – ttnphns Nov 10 '15 at 13:59
  • First comment was too long - I'll try again. Thanks ttnphns. I'm confused, however, as to why linear correlation testing would be described as having assumptions on errors in both directions (I agree, that is my interpretation as well) when the hypothesis test computationally is the same. How could the computations be valid for some cases (regression use when X non-normally distributed) but not others (correlation use when X non-normally distributed) when they are exactly the same? Or are the tests somehow telling me different things, even though people describe them as the same? – J Taylor Nov 10 '15 at 14:26
  • @JTaylor see http://stats.stackexchange.com/questions/2125/whats-the-difference-between-correlation-and-simple-linear-regression or http://stats.stackexchange.com/questions/32464/how-does-the-correlation-coefficient-differ-from-regression-slope or it should clear things up a little bit. – Tim Nov 10 '15 at 14:29
  • Testing of correlation coefficient for significance is more problematic than testing of regressional beta (athough beta of simple regression = correlation, numerically as statistics). Yes, because we have to account for the fact that errors involve both Y and X in this case. If you are ready to assume both-way error normality (and hence bivariate normality) you _may use_ for $r$ the same F test approach/formula (to test against $\rho$=0) as for the beta. But generally, testing $r$ and beta are two different tasks. – ttnphns Nov 10 '15 at 14:44
  • @Tim - thanks for the pointers. I think you were trying to clear up confusion on what regression is vs correlation, and how Y~X is different from X~Y. I think I have a handle on that (though I could be mistaken). I don't see how the links address my question though. If I say that I want to test H0:rho=0, how would you know which assumption to check (bivariate normality or conditional normality)? But why would that matter if the test is exactly the same? – J Taylor Nov 10 '15 at 14:54
  • @ttnphns - if I'm interpreting you correctly, then what you are saying is that it is conceivable to have a situation in which you are content to test a regression slope on Zy~Zx but not content to test a correlation coefficient of the two variables (because you have conditional normality, but not bivariate normality). This would seem to imply that you can have slope without having linear relationship. Is this a valid interpretation (even though I'm still trying to wrap my head around what it would mean)? – J Taylor Nov 10 '15 at 15:17
  • @JTaylor, Sorry, I was a bit hurry and probably wrong at least partly. I've just recalled (while reading your comment) that if one of the two variables is dichotomous (while the other is interval) - which instantly precludes the cloud from being bivariate normal - the usual F test for $r$ will be still valid and is equal to the t-test between the two groups. – ttnphns Nov 10 '15 at 16:02
  • And still, I continue to think that when _both_ X and Y are seen as random variables (or at least random continuous variables), testing $r$ via F/t curve assumes bivariate normality (in the previous comment, the dichotomous variable is presumed fixed, not random, and so the situation degenerates into the simple regression, ANOVA). – ttnphns Nov 10 '15 at 16:52
  • @ttnphns - I was beginning to wonder if that was the distinction. So the distinction between the tests (even though they are mechanically the same) has to do with extension of the results. If you can only assume conditional normality of Y on X, then you have to treat X as fixed, which would mean you could only generalize the test result to cases in which the same X values are sampled. Whereas if you have bivariate normality, then the result can be generalized to a new sample in X,Y space. Just speculating. – J Taylor Nov 10 '15 at 19:09
  • Yes. Or put it backward. The starting point is that in _regression_ (classical OLS) you always take independent variable X as if fixed (no error variation), only Y is random. Hence, assumptions (normality etc) in Sig.-testing go to errors about the modeled Y. When you speak of _correlation_ (its Sig.-testing) and you aren't ready to declare one of the two variables fixed then you have to accept that X is also random (and could be asked to back predict it by Y). So, for testing r, another and more strict assumption - bivariate normality - enters. That's how I was thinking through my comments. – ttnphns Nov 10 '15 at 20:07

0 Answers0