2

I ran two single variable linear regressions $A$ and $B$. $A$ had a relatively large effect size $R^2$ = $.68$ while $B$ had $R^2$ = $.10$. Regressing the explanatory variable from $A$ (= $a$) on the explanatory variable from $B$ (=$b$) shows an $R^2$ = $.44$. The correlation between these two variables is $.66$. Is there a relationship between their correlation in the Pearson Correlation Matrix and in the regression? Is it at all strange that despite the variables being correlated their regressions (on the same response variable) yield significantly different results in terms of effect size?

114
  • 701
  • 6
  • 15
  • 3
    Did you notice that $0.66^2 = 0.44$? As far as your last remark goes, there's nothing strange about that because (presumably) both regressions involved an unnamed dependent variable I will call $Y$: the $0.68$ and $0.10$ values describe *only* how $A$ and $B$ are related to $Y$, not to each other. You can find [plenty of explanations of such phenomena](http://stats.stackexchange.com/search?tab=votes&q=regression%20significant) on our site. [This thread on correlation](http://stats.stackexchange.com/questions/5747) also looks relevant. – whuber Jun 18 '14 at 20:37
  • 1
    @whuber I did, and thank you for the links. There is no need then for a strong correlation between two variables to suggest that they produce the same 'strength' of effect when regressed on the same dependent variable. – 114 Jun 18 '14 at 20:42
  • a correlation of 0.44 isn't particularly strong. It's perfectly possible to have three variables with $\hat\rho(y,a)=\sqrt{0.68})$,$\hat\rho(y,b)=\sqrt{0.10})$, and $\hat\rho(a,b)=\sqrt{0.44})$ for example. – Glen_b Jun 18 '14 at 22:42

1 Answers1

3

The restriction that is important is that the correlation matrix is positive semi-definite (the eigenvalues are all non-negative, quadratic forms are non-negative). For your example where the $R^2$'s with the response variable are $0.68$ and $0.10$ the range of possible correlations between the 2 predictors is from about $-0.275$ to $0.797$.

Clearly if the predictors had a correlation of $1$ then they would have to have the same relationship with the response. If they were highly correlated ($>0.8$) then their relationships with the response would need to be more similar. But there is nothing surprising with the values that you state.

Greg Snow
  • 46,563
  • 2
  • 90
  • 159
  • That's good to know. You mean that the correlation not having an effect on the outcome of the regressions is a result of the correlation matrix being positive semi-definite? Is there a proof somewhere for that result? – 114 Jun 19 '14 at 13:25
  • 1
    @114, The correlation does have an effect on the outcome of the regression, I was just showing that the range of possibilities was quite large. – Greg Snow Jun 19 '14 at 15:28
  • Oh, what did you mean by the restriction that the matrix is positive semi-definite then? – 114 Jun 19 '14 at 16:53
  • @114, any real correlation matrix will always be positive semi definite (some estimates of the correlation matrix based on available cases may violate this, but those are not possibilities as real correlation matrices). It is possible to generate a dataset with any correlation matrix (that is positive semi definite) and impossible to generate data with a correlation matrix that is not. I just showed that what you observed is not unusual in falling in the set of values that give a positive definite matrix (but you were correct that there are values that would not work). – Greg Snow Jun 19 '14 at 17:05