Why does the product of the bivariate regression coefficients of the $y$-on-$x$ line and $x$-on-$y$ line equal the square of the correlation?

Question

There's regression model where $Y = a + bX$ with $a = 1.6$ and $b=0.4$, which has a correlation coefficient of $r = 0.60302$.

If $X$ and $Y$ are then switched around and the equation becomes $X = c + dY$ where $c=0.4545$ and $d=0.9091$, it also has an $r$ value of $0.60302$.

I'm hoping someone can explain why $(d\times b)^{0.5}$ is also $0.60302$.

score 17 · Answer 1 · answered Sep 21 '11 at 23:43

17

$b = r \; \text{SD}_y / \text{SD}_x$ and $d = r \; \text{SD}_x / \text{SD}_y$, so $b\times d = r^2$.

Many statistics textbooks would touch on this; I like Freedman et al., Statistics. See also here and this wikipedia article.

answered Sep 21 '11 at 23:43

Karl

5,957
18
34

score 10 · Answer 2 · edited Oct 05 '15 at 18:02

10

Have a look at Thirteen Ways to Look at the Correlation Coefficient - and especially ways 3, 4, 5 will be of most interest for you.

Rodgers, J.L., & Nicewander, W.A. (1988). Thirteen ways to look at the correlation coefficient. The American Statistician, 42, 1, pp. 59-66.

edited Oct 05 '15 at 18:02

gung - Reinstate Monica

132,789
81
357
650

answered Sep 22 '11 at 00:30

Tomas

5,735
11
52
93

2

This should probably have been a comment. Note that the link has gone dead. I have updated the link & provided a full citation. Can you elaborate, or provide any additional information so this will still be valuable even if the link goes dead again? – gung - Reinstate Monica Oct 05 '15 at 17:58
2

The Rodgers & Nicewander article is summarized on our site at http://stats.stackexchange.com/q/70969/22228. – whuber Oct 06 '15 at 20:34

score 3 · Answer 3 · edited Apr 13 '17 at 12:44

$\DeclareMathOperator{\Cov}{Cov}$ $\DeclareMathOperator{\Corr}{Corr}$ $\DeclareMathOperator{\SD}{SD}$ $\DeclareMathOperator{\Var}{Var}$ $\DeclareMathOperator{\sgn}{sgn}$ $\DeclareMathOperator{\nsum}{\sum_{i=1}^{n}}$

Recall that many introductory texts define

$$S_{xy} = \nsum (x_i - \bar x)(y_i - \bar y)$$

Then by setting $y$ as $x$ we have $S_{xx} = \nsum (x_i - \bar x)^2$ and similarly $S_{yy} = \nsum (y_i - \bar y)^2$.

Formulae for the correlation coefficient $r$, the slope of the $y$-on-$x$ regression (your $b$) and the slope of the $x$-on-$y$ regression (your $d$) are often given as:

$$ \begin{align} r &= \frac{S_{xy}}{\sqrt{S_{xx}S_{yy}}} \tag{1} \\ \hat \beta_{y\text{ on }x} &= \frac{S_{xy}}{S_{xx}} \tag{2} \\ \hat \beta_{x\text{ on }y} &= \frac{S_{xy}}{S_{yy}} \tag{3} \end{align} $$

Then multiplying $(2)$ and $(3)$ clearly gives the square of $(1)$:

$$\hat \beta_{y\text{ on }x} \cdot \hat \beta_{x\text{ on }y} = \frac{S_{xy}^2}{S_{xx}S_{yy}} = r^2 $$

Alternatively the numerators and denominators of the fractions in $(1)$, $(2)$ and $(3)$ are often divided by $n$ or $(n-1)$ so that things are framed in terms of sample or estimated variances and covariances. For instance, from $(1)$, the estimated correlation coefficient is just the estimated covariance, scaled by the estimated standard deviations:

$$\begin{align} r &= \widehat \Corr(X,Y) = \frac{\widehat \Cov(X,Y)}{\widehat{\SD(X)}\widehat{\SD(Y)}} \tag{4} \\ \hat \beta_{y\text{ on }x} &= \frac{\widehat \Cov(X,Y)}{\widehat{\Var(X)}} \tag{5} \\ \hat \beta_{x\text{ on }y} &= \frac{\widehat \Cov(X,Y)}{\widehat{\Var(Y)}} \tag{6} \end{align}$$

We then immediately find from multiplying $(5)$ and $(6)$ that

$$\hat \beta_{y\text{ on }x} \hat \beta_{x\text{ on }y} = \frac{\widehat \Cov(X,Y)^2}{\widehat{\Var(X)}\widehat{\Var(Y)}} = \left( \frac{\widehat \Cov(X,Y)}{\widehat{\SD(X)}\widehat{\SD(Y)}} \right)^2 = r^2 $$

We might instead have rearranged $(4)$ to write the covariance as a "scaled-up" correlation:

$$\widehat \Cov(X,Y) = r\cdot \widehat{\SD(X)} \widehat{\SD(Y)} \tag{7}$$

Then by substituting $(7)$ into $(5)$ and $(6)$ we could rewrite the regression coefficients as $\hat \beta_{y\text{ on }x} = r \frac{\widehat \SD(y)}{\widehat \SD(x)}$ and $\hat \beta_{x\text{ on }y} = r \frac{\widehat \SD(x)}{\widehat \SD(y)}$. Multiplying these together would also produce $r^2$, and this is @Karl's solution. Writing the slopes in this way helps explain how we can see the correlation coefficient as a standardized regression slope.

Finally note that in your case $r = \sqrt{bd} =\sqrt{\hat \beta_{y\text{ on }x} \hat \beta_{x\text{ on }y}}$ but this was because your correlation was positive. If your correlation were negative, then you would have to take the negative root.

To work out whether your correlation is positive or negative, you simply need to regard the sign (plus or minus) of your regression coefficient — it doesn't matter whether you look at the $y$-on-0$x$ or $x$-on-$y$ as their signs will be the same. So you can use the formula:

$$ r = \sgn(\hat \beta_{y\text{ on }x}) \sqrt{\hat \beta_{y\text{ on }x} \hat \beta_{x\text{ on }y}}$$

where $\sgn$ is the signum function, i.e. is $+1$ if the slope is positive and $-1$ if the slope is negative.

You might find [this answer](http://stats.stackexchange.com/a/20556/6633) of mine to be of interest even though it does not explicitly address the question asked here. — Dilip Sarwate, Oct 06 '15 at 19:09

Why does the product of the bivariate regression coefficients of the $y$-on-$x$ line and $x$-on-$y$ line equal the square of the correlation?

3 Answers3

Linked

Related