11

There's regression model where $Y = a + bX$ with $a = 1.6$ and $b=0.4$, which has a correlation coefficient of $r = 0.60302$.

If $X$ and $Y$ are then switched around and the equation becomes $X = c + dY$ where $c=0.4545$ and $d=0.9091$, it also has an $r$ value of $0.60302$.

I'm hoping someone can explain why $(d\times b)^{0.5}$ is also $0.60302$.

Silverfish
  • 20,678
  • 23
  • 92
  • 180
Mike
  • 111
  • 3

3 Answers3

17

$b = r \; \text{SD}_y / \text{SD}_x$ and $d = r \; \text{SD}_x / \text{SD}_y$, so $b\times d = r^2$.

Many statistics textbooks would touch on this; I like Freedman et al., Statistics. See also here and this wikipedia article.

Karl
  • 5,957
  • 18
  • 34
10

Have a look at Thirteen Ways to Look at the Correlation Coefficient - and especially ways 3, 4, 5 will be of most interest for you.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
Tomas
  • 5,735
  • 11
  • 52
  • 93
  • 2
    This should probably have been a comment. Note that the link has gone dead. I have updated the link & provided a full citation. Can you elaborate, or provide any additional information so this will still be valuable even if the link goes dead again? – gung - Reinstate Monica Oct 05 '15 at 17:58
  • 2
    The Rodgers & Nicewander article is summarized on our site at http://stats.stackexchange.com/q/70969/22228. – whuber Oct 06 '15 at 20:34
3

$\DeclareMathOperator{\Cov}{Cov}$ $\DeclareMathOperator{\Corr}{Corr}$ $\DeclareMathOperator{\SD}{SD}$ $\DeclareMathOperator{\Var}{Var}$ $\DeclareMathOperator{\sgn}{sgn}$ $\DeclareMathOperator{\nsum}{\sum_{i=1}^{n}}$

Recall that many introductory texts define

$$S_{xy} = \nsum (x_i - \bar x)(y_i - \bar y)$$

Then by setting $y$ as $x$ we have $S_{xx} = \nsum (x_i - \bar x)^2$ and similarly $S_{yy} = \nsum (y_i - \bar y)^2$.

Formulae for the correlation coefficient $r$, the slope of the $y$-on-$x$ regression (your $b$) and the slope of the $x$-on-$y$ regression (your $d$) are often given as:

$$ \begin{align} r &= \frac{S_{xy}}{\sqrt{S_{xx}S_{yy}}} \tag{1} \\ \hat \beta_{y\text{ on }x} &= \frac{S_{xy}}{S_{xx}} \tag{2} \\ \hat \beta_{x\text{ on }y} &= \frac{S_{xy}}{S_{yy}} \tag{3} \end{align} $$

Then multiplying $(2)$ and $(3)$ clearly gives the square of $(1)$:

$$\hat \beta_{y\text{ on }x} \cdot \hat \beta_{x\text{ on }y} = \frac{S_{xy}^2}{S_{xx}S_{yy}} = r^2 $$

Alternatively the numerators and denominators of the fractions in $(1)$, $(2)$ and $(3)$ are often divided by $n$ or $(n-1)$ so that things are framed in terms of sample or estimated variances and covariances. For instance, from $(1)$, the estimated correlation coefficient is just the estimated covariance, scaled by the estimated standard deviations:

$$\begin{align} r &= \widehat \Corr(X,Y) = \frac{\widehat \Cov(X,Y)}{\widehat{\SD(X)}\widehat{\SD(Y)}} \tag{4} \\ \hat \beta_{y\text{ on }x} &= \frac{\widehat \Cov(X,Y)}{\widehat{\Var(X)}} \tag{5} \\ \hat \beta_{x\text{ on }y} &= \frac{\widehat \Cov(X,Y)}{\widehat{\Var(Y)}} \tag{6} \end{align}$$

We then immediately find from multiplying $(5)$ and $(6)$ that

$$\hat \beta_{y\text{ on }x} \hat \beta_{x\text{ on }y} = \frac{\widehat \Cov(X,Y)^2}{\widehat{\Var(X)}\widehat{\Var(Y)}} = \left( \frac{\widehat \Cov(X,Y)}{\widehat{\SD(X)}\widehat{\SD(Y)}} \right)^2 = r^2 $$


We might instead have rearranged $(4)$ to write the covariance as a "scaled-up" correlation:

$$\widehat \Cov(X,Y) = r\cdot \widehat{\SD(X)} \widehat{\SD(Y)} \tag{7}$$

Then by substituting $(7)$ into $(5)$ and $(6)$ we could rewrite the regression coefficients as $\hat \beta_{y\text{ on }x} = r \frac{\widehat \SD(y)}{\widehat \SD(x)}$ and $\hat \beta_{x\text{ on }y} = r \frac{\widehat \SD(x)}{\widehat \SD(y)}$. Multiplying these together would also produce $r^2$, and this is @Karl's solution. Writing the slopes in this way helps explain how we can see the correlation coefficient as a standardized regression slope.


Finally note that in your case $r = \sqrt{bd} =\sqrt{\hat \beta_{y\text{ on }x} \hat \beta_{x\text{ on }y}}$ but this was because your correlation was positive. If your correlation were negative, then you would have to take the negative root.

To work out whether your correlation is positive or negative, you simply need to regard the sign (plus or minus) of your regression coefficient — it doesn't matter whether you look at the $y$-on-0$x$ or $x$-on-$y$ as their signs will be the same. So you can use the formula:

$$ r = \sgn(\hat \beta_{y\text{ on }x}) \sqrt{\hat \beta_{y\text{ on }x} \hat \beta_{x\text{ on }y}}$$

where $\sgn$ is the signum function, i.e. is $+1$ if the slope is positive and $-1$ if the slope is negative.

Silverfish
  • 20,678
  • 23
  • 92
  • 180
  • 1
    You might find [this answer](http://stats.stackexchange.com/a/20556/6633) of mine to be of interest even though it does not explicitly address the question asked here. – Dilip Sarwate Oct 06 '15 at 19:09